AIR: Unifying Individual and Collective Exploration in Cooperative Multi-Agent Reinforcement Learning

Authors

  • Guangchong Zhou The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
  • Zeren Zhang The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
  • Guoliang Fan The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v39i21.34454

Abstract

Exploration in cooperative multi-agent reinforcement learning (MARL) remains challenging for value-based agents due to the absence of an explicit policy. Existing approaches include individual exploration based on uncertainty towards the system and collective exploration through behavioral diversity among agents. However, the introduction of additional structures often leads to reduced training efficiency and infeasible integration of these methods. In this paper, we propose Adaptive exploration via Identity Recognition~(AIR), which consists of two adversarial components: a classifier that recognizes agent identities from their trajectories, and an action selector that adaptively adjusts the mode and degree of exploration. We theoretically prove that AIR can facilitate both individual and collective exploration during training, and experiments also demonstrate the efficiency and effectiveness of AIR across various tasks.

Published

2025-04-11

How to Cite

Zhou, G., Zhang, Z., & Fan, G. (2025). AIR: Unifying Individual and Collective Exploration in Cooperative Multi-Agent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 39(21), 22919-22927. https://doi.org/10.1609/aaai.v39i21.34454

Issue

Section

AAAI Technical Track on Machine Learning VII