Safe Multi-Agent Reinforcement Learning via Distributional Safety Critic and Maximum Entropy Optimization

Authors

  • Qiwei Liu East China University of Science and Technology Tongji University
  • Ye Yuan East China University of Science and Technology
  • Lingyue Zhang East China University of Science and Technology
  • Kaitian Chen East China University of Science and Technology
  • Yunkai Lv East China University of Science and Technology Shanghai Jiao Tong University
  • Sheng Gao East China University of Science and Technology
  • Huaicheng Yan East China University of Science and Technology Shanghai University of Electric Power

DOI:

https://doi.org/10.1609/aaai.v40i35.40198

Abstract

Deploying multi-agent reinforcement learning (MARL) in safety-critical systems faces significant challenges due to insufficient agent exploration and inadequate safety constraint guarantees. Current approaches are constrained by two fundamental limitations: inefficient exploration leading to suboptimal policies, and expected-cost-based constraint frameworks failing to ensure full-process safety. To address these challenges, this paper proposes a novel safety-aware maximum entropy MARL framework using Conditional Value-at-Risk (CVaR) as a joint safety metric, which quantifies constraint satisfaction under worst-case scenarios for multi-agent systems. Moreover, we develop the Worst-Case Multi-Agent Soft Actor-Critic (WCMASAC) algorithm, incorporating sequential update mechanisms and maximum entropy optimization for heterogeneous agents, enhanced with distributed safety critics. Theoretically, we establish the monotonic improvement property, guaranteed constraint satisfaction, and convergence to a generalized Nash equilibrium for WCMASAC. Extensive experiments on Safety-Gymnasium based benchmarks demonstrate that WCMASAC outperforms state-of-the-art baselines in both task reward acquisition and safety constraint violation reduction, while exhibiting superior exploration efficiency and risk-aware control capabilities.

Downloads

Published

2026-03-14

How to Cite

Liu, Q., Yuan, Y., Zhang, L., Chen, K., Lv, Y., Gao, S., & Yan, H. (2026). Safe Multi-Agent Reinforcement Learning via Distributional Safety Critic and Maximum Entropy Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(35), 29555-29563. https://doi.org/10.1609/aaai.v40i35.40198

Issue

Section

AAAI Technical Track on Multiagent Systems