Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning
DOI:
https://doi.org/10.1609/aaai.v40i43.40933Abstract
We study an efficient implementation of Multi-Armed Bandit (MAB)-based Monte-Carlo Tree Search (MCTS) for classical planning. One weakness of MCTS is that it spends a significant time in deciding which node to expand next. While selecting a node from an OPEN list with N nodes has O(1) runtime complexity with traditional array-based priority-queues for dense integer keys, the tree-based OPEN list used by MCTS requires O(log N), which roughly corresponds to the search depth d. In classical planning, d is arbitrarily large (e.g., 2^k-1 in k-disk Tower-of-Hanoi) and the runtime for node selection is significant, unlike in game tree search, where the cost is negligible compared to the node evaluation (rollouts) because d is inherently limited by the game (e.g. d≦361 in Go). To improve this bottleneck, we propose a bilevel modification to MCTS that runs a best-first search from each selected leaf nodes with an expansion budget proportional to d, which achieves amortized O(1) runtime for node selection, equivalent to traditional queue-based OPEN list. In addition, we introduce Tree Collapsing, an enhancement that reduces action selection steps and further improves the performance.Downloads
Published
2026-03-14
How to Cite
Asai, M. (2026). Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36155–36162. https://doi.org/10.1609/aaai.v40i43.40933
Issue
Section
AAAI Technical Track on Planning, Routing, and Scheduling