Intrinsic Action Tendency Consistency for Cooperative Multi-Agent Reinforcement Learning

Authors

  • Junkai Zhang Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
  • Yifan Zhang Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences, Nanjing Nanjing Artificial Intelligence Research of AI
  • Xi Sheryl Zhang Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences, Nanjing Nanjing Artificial Intelligence Research of AI
  • Yifan Zang Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
  • Jian Cheng Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences, Nanjing Nanjing Artificial Intelligence Research of AI

DOI:

https://doi.org/10.1609/aaai.v38i16.29711

Keywords:

MAS: Coordination and Collaboration, MAS: Multiagent Learning

Abstract

Efficient collaboration in the centralized training with decentralized execution (CTDE) paradigm remains a challenge in cooperative multi-agent systems. We identify divergent action tendencies among agents as a significant obstacle to CTDE's training efficiency, requiring a large number of training samples to achieve a unified consensus on agents' policies. This divergence stems from the lack of adequate team consensus-related guidance signals during credit assignment in CTDE. To address this, we propose Intrinsic Action Tendency Consistency, a novel approach for cooperative multi-agent reinforcement learning. It integrates intrinsic rewards, obtained through an action model, into a reward-additive CTDE (RA-CTDE) framework. We formulate an action model that enables surrounding agents to predict the central agent's action tendency. Leveraging these predictions, we compute a cooperative intrinsic reward that encourages agents to align their actions with their neighbors' predictions. We establish the equivalence between RA-CTDE and CTDE through theoretical analyses, demonstrating that CTDE's training process can be achieved using N individual targets. Building on this insight, we introduce a novel method to combine intrinsic rewards and RA-CTDE. Extensive experiments on challenging tasks in SMAC, MPE, and GRF benchmarks showcase the improved performance of our method.

Downloads

Published

2024-03-24

How to Cite

Zhang, J., Zhang, Y., Zhang, X. S., Zang, Y., & Cheng, J. (2024). Intrinsic Action Tendency Consistency for Cooperative Multi-Agent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 17600-17608. https://doi.org/10.1609/aaai.v38i16.29711

Issue

Section

AAAI Technical Track on Multiagent Systems