Safe Multi-Agent Reinforcement Learning via Distributional Safety Critic and Maximum Entropy Optimization
DOI:
https://doi.org/10.1609/aaai.v40i35.40198Abstract
Deploying multi-agent reinforcement learning (MARL) in safety-critical systems faces significant challenges due to insufficient agent exploration and inadequate safety constraint guarantees. Current approaches are constrained by two fundamental limitations: inefficient exploration leading to suboptimal policies, and expected-cost-based constraint frameworks failing to ensure full-process safety. To address these challenges, this paper proposes a novel safety-aware maximum entropy MARL framework using Conditional Value-at-Risk (CVaR) as a joint safety metric, which quantifies constraint satisfaction under worst-case scenarios for multi-agent systems. Moreover, we develop the Worst-Case Multi-Agent Soft Actor-Critic (WCMASAC) algorithm, incorporating sequential update mechanisms and maximum entropy optimization for heterogeneous agents, enhanced with distributed safety critics. Theoretically, we establish the monotonic improvement property, guaranteed constraint satisfaction, and convergence to a generalized Nash equilibrium for WCMASAC. Extensive experiments on Safety-Gymnasium based benchmarks demonstrate that WCMASAC outperforms state-of-the-art baselines in both task reward acquisition and safety constraint violation reduction, while exhibiting superior exploration efficiency and risk-aware control capabilities.Published
2026-03-14
How to Cite
Liu, Q., Yuan, Y., Zhang, L., Chen, K., Lv, Y., Gao, S., & Yan, H. (2026). Safe Multi-Agent Reinforcement Learning via Distributional Safety Critic and Maximum Entropy Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(35), 29555-29563. https://doi.org/10.1609/aaai.v40i35.40198
Issue
Section
AAAI Technical Track on Multiagent Systems