Gradient-Protected Value Decomposition for Cooperative Multi-Agent Reinforcement Learning
DOI:
https://doi.org/10.1609/aaai.v40i26.39329Abstract
In recent years, deep multi-agent reinforcement learning (MARL) has demonstrated remarkable potential in solving complex cooperative tasks by enabling decentralized yet efficient coordination among agents. However, during decentralized training, agent policy updates induced by different joint action samples may conflict, leading to gradient interference that hinders convergence and the emergence of coordinated behavior. In this paper, we analyze and empirically validate the phenomenon of gradient interference. To address this, we then propose Gradient-Protected Value Decomposition (GPVD), a novel MARL framework that explicitly protects the gradient signals of optimal collaborative actions by suppressing the impact of interfering actions. GPVD employs a dynamic gradient protection mechanism that identifies optimal collaborative joint actions and reweights the loss to attenuate gradients from non-collaborative interfering actions. To effectively identify high-value collaborative actions, we apply SimHash-based state grouping to discover consistent collaboration patterns across similar states. Furthermore, a count-based intrinsic reward is incorporated to encourage exploration and improve the coverage of potentially optimal joint actions. Experiments on challenging multi-agent benchmarks demonstrate that GPVD achieves faster convergence, stronger coordination, and greater training stability compared to state-of-the-art value decomposition methods.Downloads
Published
2026-03-14
How to Cite
Hou, J., Dou, H., Dang, L., Chen, L., & Ge, C. (2026). Gradient-Protected Value Decomposition for Cooperative Multi-Agent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(26), 21779–21787. https://doi.org/10.1609/aaai.v40i26.39329
Issue
Section
AAAI Technical Track on Machine Learning III