Intrinsic Action Tendency Consistency for Cooperative Multi-Agent Reinforcement Learning
DOI:
https://doi.org/10.1609/aaai.v38i16.29711Keywords:
MAS: Coordination and Collaboration, MAS: Multiagent LearningAbstract
Efficient collaboration in the centralized training with decentralized execution (CTDE) paradigm remains a challenge in cooperative multi-agent systems. We identify divergent action tendencies among agents as a significant obstacle to CTDE's training efficiency, requiring a large number of training samples to achieve a unified consensus on agents' policies. This divergence stems from the lack of adequate team consensus-related guidance signals during credit assignment in CTDE. To address this, we propose Intrinsic Action Tendency Consistency, a novel approach for cooperative multi-agent reinforcement learning. It integrates intrinsic rewards, obtained through an action model, into a reward-additive CTDE (RA-CTDE) framework. We formulate an action model that enables surrounding agents to predict the central agent's action tendency. Leveraging these predictions, we compute a cooperative intrinsic reward that encourages agents to align their actions with their neighbors' predictions. We establish the equivalence between RA-CTDE and CTDE through theoretical analyses, demonstrating that CTDE's training process can be achieved using N individual targets. Building on this insight, we introduce a novel method to combine intrinsic rewards and RA-CTDE. Extensive experiments on challenging tasks in SMAC, MPE, and GRF benchmarks showcase the improved performance of our method.Downloads
Published
2024-03-24
How to Cite
Zhang, J., Zhang, Y., Zhang, X. S., Zang, Y., & Cheng, J. (2024). Intrinsic Action Tendency Consistency for Cooperative Multi-Agent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 17600-17608. https://doi.org/10.1609/aaai.v38i16.29711
Issue
Section
AAAI Technical Track on Multiagent Systems