Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient

Authors

  • Wubing Chen Nanjing University
  • Wenbin Li Nanjing University
  • Xiao Liu Nanjing University
  • Shangdong Yang Nanjing University of Posts and Telecommunications Nanjing University
  • Yang Gao Nanjing University

DOI:

https://doi.org/10.1609/aaai.v37i10.26364

Keywords:

MAS: Multiagent Learning, MAS: Coordination and Collaboration, ML: Reinforcement Learning Algorithms, GTEP: Cooperative Game Theory

Abstract

Cooperative multi-agent policy gradient (MAPG) algorithms have recently attracted wide attention and are regarded as a general scheme for the multi-agent system. Credit assignment plays an important role in MAPG and can induce cooperation among multiple agents. However, most MAPG algorithms cannot achieve good credit assignment because of the game-theoretic pathology known as centralized-decentralized mismatch. To address this issue, this paper presents a novel method, Multi-Agent Polarization Policy Gradient (MAPPG). MAPPG takes a simple but efficient polarization function to transform the optimal consistency of joint and individual actions into easily realized constraints, thus enabling efficient credit assignment in MAPPG. Theoretically, we prove that individual policies of MAPPG can converge to the global optimum. Empirically, we evaluate MAPPG on the well-known matrix game and differential game, and verify that MAPPG can converge to the global optimum for both discrete and continuous action spaces. We also evaluate MAPPG on a set of StarCraft II micromanagement tasks and demonstrate that MAPPG outperforms the state-of-the-art MAPG algorithms.

Downloads

Published

2023-06-26

How to Cite

Chen, W., Li, W., Liu, X., Yang, S., & Gao, Y. (2023). Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient. Proceedings of the AAAI Conference on Artificial Intelligence, 37(10), 11542-11550. https://doi.org/10.1609/aaai.v37i10.26364

Issue

Section

AAAI Technical Track on Multiagent Systems