Safe Multi-Agent Reinforcement Learning via Distributional Safety Critic and Maximum Entropy Optimization

Qiwei Liu; Ye Yuan; Lingyue Zhang; Kaitian Chen; Yunkai Lv; Sheng Gao; Huaicheng Yan

doi:10.1609/aaai.v40i35.40198

Authors

Qiwei Liu East China University of Science and Technology Tongji University
Ye Yuan East China University of Science and Technology
Lingyue Zhang East China University of Science and Technology
Kaitian Chen East China University of Science and Technology
Yunkai Lv East China University of Science and Technology Shanghai Jiao Tong University
Sheng Gao East China University of Science and Technology
Huaicheng Yan East China University of Science and Technology Shanghai University of Electric Power

DOI:

https://doi.org/10.1609/aaai.v40i35.40198

Abstract

Deploying multi-agent reinforcement learning (MARL) in safety-critical systems faces significant challenges due to insufficient agent exploration and inadequate safety constraint guarantees. Current approaches are constrained by two fundamental limitations: inefficient exploration leading to suboptimal policies, and expected-cost-based constraint frameworks failing to ensure full-process safety. To address these challenges, this paper proposes a novel safety-aware maximum entropy MARL framework using Conditional Value-at-Risk (CVaR) as a joint safety metric, which quantifies constraint satisfaction under worst-case scenarios for multi-agent systems. Moreover, we develop the Worst-Case Multi-Agent Soft Actor-Critic (WCMASAC) algorithm, incorporating sequential update mechanisms and maximum entropy optimization for heterogeneous agents, enhanced with distributed safety critics. Theoretically, we establish the monotonic improvement property, guaranteed constraint satisfaction, and convergence to a generalized Nash equilibrium for WCMASAC. Extensive experiments on Safety-Gymnasium based benchmarks demonstrate that WCMASAC outperforms state-of-the-art baselines in both task reward acquisition and safety constraint violation reduction, while exhibiting superior exploration efficiency and risk-aware control capabilities.

Safe Multi-Agent Reinforcement Learning via Distributional Safety Critic and Maximum Entropy Optimization

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information