Value-Decomposition Multi-Agent Actor-Critics

Authors

  • Jianyu Su University of Virginia
  • Stephen Adams University of Virginia
  • Peter Beling University of Virginia

DOI:

https://doi.org/10.1609/aaai.v35i13.17353

Keywords:

Multiagent Learning, Reinforcement Learning

Abstract

The exploitation of extra state information has been an active research area in multi-agent reinforcement learning (MARL). QMIX represents the joint action-value using a non-negative function approximator and achieves the best performance on the StarCraft II micromanagement testbed, a common MARL benchmark. However, our experiments demonstrate that, in some cases, QMIX performs sub-optimally with the A2C framework, a training paradigm that promotes algorithm training efficiency. To obtain a reasonable trade-off between training efficiency and algorithm performance, we extend value-decomposition to actor-critic methods that are compatible with A2C and propose a novel actor-critic framework, value-decomposition actor-critic (VDAC). We evaluate VDAC on the StarCraft II micromanagement task and demonstrate that the proposed framework improves median performance over other actor-critic methods. Furthermore, we use a set of ablation experiments to identify the key factors that contribute to the performance of VDAC.

Downloads

Published

2021-05-18

How to Cite

Su, J., Adams, S., & Beling, P. (2021). Value-Decomposition Multi-Agent Actor-Critics. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 11352-11360. https://doi.org/10.1609/aaai.v35i13.17353

Issue

Section

AAAI Technical Track on Multiagent Systems