Extreme Value Monte Carlo Tree Search (Extended Abstract)

Masataro Asai; Stephen Wissow

doi:10.1609/socs.v17i1.31569

Authors

Masataro Asai MIT-IBM Watson AI Lab
Stephen Wissow University of New Hampshire

DOI:

https://doi.org/10.1609/socs.v17i1.31569

Abstract

Monte-Carlo Tree Search (MCTS) combined with Multi-Armed Bandit (MAB) has had limited success in domain-independent classical planning until recently. Previous work (Wissow and Asai 2023) showed that UCB1, designed for bounded rewards, does not perform well when applied to the cost-to-go estimates of classical planning, which are unbounded in R, then improved the performance by using a Gaussian reward MAB instead. We further sharpen our understanding of ideal bandits for planning tasks by resolving three issues: First, Gaussian MABs under-specify the support of cost-to-go estimates as [−∞, ∞]. Second, Full-Bellman backup that backpropagates max/min of samples lacks theoretical justifications. Third, removing dead-ends lacks justifications in Monte-Carlo backup. We use Extreme Value Theory Type 2 to resolve them at once, propose two bandits (UCB1-Uniform/Power), and apply them to MCTS for classical planning. We formally prove their regret bounds and empirically demonstrate their performance in classical planning.

Extreme Value Monte Carlo Tree Search (Extended Abstract)

Authors

DOI:

Abstract

Downloads

Published

Issue

Section