Multiagent Gumbel MuZero: Efficient Planning in Combinatorial Action Spaces
DOI:
https://doi.org/10.1609/aaai.v38i11.29121Keywords:
ML: Reinforcement Learning, MAS: Coordination and Collaboration, SO: Sampling/Simulation-based SearchAbstract
AlphaZero and MuZero have achieved state-of-the-art (SOTA) performance in a wide range of domains, including board games and robotics, with discrete and continuous action spaces. However, to obtain an improved policy, they often require an excessively large number of simulations, especially for domains with large action spaces. As the simulation budget decreases, their performance drops significantly. In addition, many important real-world applications have combinatorial (or exponential) action spaces, making it infeasible to search directly over all possible actions. In this paper, we extend AlphaZero and MuZero to learn and plan in more complex multiagent (MA) Markov decision processes, where the action spaces increase exponentially with the number of agents. Our new algorithms, MA Gumbel AlphaZero and MA Gumbel MuZero, respectively without and with model learning, achieve superior performance on cooperative multiagent control problems, while reducing the number of environmental interactions by up to an order of magnitude compared to model-free approaches. In particular, we significantly improve prior performance when planning with much fewer simulation budgets. The code and appendix are available at https://github.com/tjuHaoXiaotian/MA-MuZero.Downloads
Published
2024-03-24
How to Cite
Hao, X., Hao, J., Xiao, C., Li, K., Li, D., & Zheng, Y. (2024). Multiagent Gumbel MuZero: Efficient Planning in Combinatorial Action Spaces. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12304-12312. https://doi.org/10.1609/aaai.v38i11.29121
Issue
Section
AAAI Technical Track on Machine Learning II