Extreme Value Monte Carlo Tree Search for Classical Planning

Authors

  • Masataro Asai IBM Research / MIT-IBM Watson AI Lab
  • Stephen Wissow University of New Hampshire

DOI:

https://doi.org/10.1609/aaai.v40i43.40934

Abstract

Despite being successful in board games and reinforcement learning (RL), Monte Carlo Tree Search (MCTS) combined with Multi Armed Bandit (MAB) has seen limited success in domain-independent classical planning until recently. Previous work (Wissow and Asai, 2024) showed that UCB1, designed for bounded rewards, does not perform well as applied to cost-to-go estimates in classical planning, because cost-to-go estimates are unbounded, and showed improved performance using a Gaussian reward MAB instead. This paper further sharpens our understanding of ideal bandits for planning tasks. Existing work has two issues: first, Gaussian MABs under-specify the support of cost-to-go estimates as (-∞, ∞), which we can narrow down. Second, Full Bellman backup (Schulte and Keller, 2014) that backpropagates sample max/min lacks theoretical justification. We use Peaks-Over-Threashold Extreme Value Theory to resolve both issues at once, propose a new bandit algorithm (UCB1-Uniform). We formally prove its regret bound and empirically demonstrate its performance in classical planning.

Published

2026-03-14

How to Cite

Asai, M., & Wissow, S. (2026). Extreme Value Monte Carlo Tree Search for Classical Planning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36163–36171. https://doi.org/10.1609/aaai.v40i43.40934

Issue

Section

AAAI Technical Track on Planning, Routing, and Scheduling