TY - JOUR AU - Li, Shihui AU - Wu, Yi AU - Cui, Xinyue AU - Dong, Honghua AU - Fang, Fei AU - Russell, Stuart PY - 2019/07/17 Y2 - 2024/03/28 TI - Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 33 IS - 01 SE - AAAI Technical Track: Machine Learning DO - 10.1609/aaai.v33i01.33014213 UR - https://ojs.aaai.org/index.php/AAAI/article/view/4327 SP - 4213-4220 AB - <p>Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, <em>MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG)</em> with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose <em>Multi-Agent Adversarial Learning (MAAL)</em> to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.</p> ER -