Multi-Agent Tree Search with Dynamic Reward Shaping
Keywords:Multi-Agent Planning, Multi-Agent Reinforcement Learning, Planning And Learning, Reward Shaping, Monte Carlo Tree Search, Mixed Cooperative-Competitive, Emergent Behavior, Linear Temporal Logic, Non-Markovian Reinforcement Learning
AbstractSparse rewards and their representation in multi-agent domains remains a challenge for the development of multi-agent planning systems. While techniques from formal methods can be adopted to represent the underlying planning objectives, their use in facilitating and accelerating learning has witnessed limited attention in multi-agent settings. Reward shaping methods that leverage such formal representations in single-agent settings are typically static in the sense that the artificial rewards remain the same throughout the entire learning process. In contrast, we investigate the use of such formal objective representations to define novel reward shaping functions that capture the learned experience of the agents. More specifically, we leverage the automaton representation of the underlying team objectives in mixed cooperative-competitive domains such that each automaton transition is assigned an expected value proportional to the frequency with which it was observed in successful trajectories of past behavior. This form of dynamic reward shaping is proposed within a multi-agent tree search architecture wherein agents can simultaneously reason about the future behavior of other agents as well as their own future behavior.
How to Cite
Velasquez, A., Bissey, B., Barak, L., Melcer, D., Beckus, A., Alkhouri, I., & Atia, G. (2022). Multi-Agent Tree Search with Dynamic Reward Shaping. Proceedings of the International Conference on Automated Planning and Scheduling, 32(1), 652-661. Retrieved from https://ojs.aaai.org/index.php/ICAPS/article/view/19854
Planning and Learning Track