Extreme Value Monte Carlo Tree Search for Classical Planning
DOI:
https://doi.org/10.1609/aaai.v40i43.40934Abstract
Despite being successful in board games and reinforcement learning (RL), Monte Carlo Tree Search (MCTS) combined with Multi Armed Bandit (MAB) has seen limited success in domain-independent classical planning until recently. Previous work (Wissow and Asai, 2024) showed that UCB1, designed for bounded rewards, does not perform well as applied to cost-to-go estimates in classical planning, because cost-to-go estimates are unbounded, and showed improved performance using a Gaussian reward MAB instead. This paper further sharpens our understanding of ideal bandits for planning tasks. Existing work has two issues: first, Gaussian MABs under-specify the support of cost-to-go estimates as (-∞, ∞), which we can narrow down. Second, Full Bellman backup (Schulte and Keller, 2014) that backpropagates sample max/min lacks theoretical justification. We use Peaks-Over-Threashold Extreme Value Theory to resolve both issues at once, propose a new bandit algorithm (UCB1-Uniform). We formally prove its regret bound and empirically demonstrate its performance in classical planning.Downloads
Published
2026-03-14
How to Cite
Asai, M., & Wissow, S. (2026). Extreme Value Monte Carlo Tree Search for Classical Planning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36163–36171. https://doi.org/10.1609/aaai.v40i43.40934
Issue
Section
AAAI Technical Track on Planning, Routing, and Scheduling