TY - JOUR AU - Yang, Qisong AU - Simão, Thiago D. AU - Tindemans, Simon H AU - Spaan, Matthijs T. J. PY - 2021/05/18 Y2 - 2024/03/28 TI - WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 35 IS - 12 SE - AAAI Technical Track on Machine Learning V DO - 10.1609/aaai.v35i12.17272 UR - https://ojs.aaai.org/index.php/AAAI/article/view/17272 SP - 10639-10646 AB - Safe exploration is regarded as a key priority area for reinforcement learning research.With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained.However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution.For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results.We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control.More specifically, a certain level of conditional Value-at-Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety.As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods. ER -