P2BPO: Permeable Penalty Barrier-Based Policy Optimization for Safe RL

Authors

  • Sumanta Dey Indian Institute of Technology Kharagpur
  • Pallab Dasgupta Synopsys
  • Soumyajit Dey Indian Institute of Technology Kharagpur

DOI:

https://doi.org/10.1609/aaai.v38i19.30094

Keywords:

General

Abstract

Safe Reinforcement Learning (SRL) algorithms aim to learn a policy that maximizes the reward while satisfying the safety constraints. One of the challenges in SRL is that it is often difficult to balance the two objectives of reward maximization and safety constraint satisfaction. Existing algorithms utilize constraint optimization techniques like penalty-based, barrier penalty-based, and Lagrangian-based dual or primal policy optimizations methods. However, they suffer from training oscillations and approximation errors, which impact the overall learning objectives. This paper proposes the Permeable Penalty Barrier-based Policy Optimization (P2BPO) algorithm that addresses this issue by allowing a small fraction of penalty beyond the penalty barrier, and a parameter is used to control this permeability. In addition, an adaptive penalty parameter is used instead of a constant one, which is initialized with a low value and increased gradually as the agent violates the safety constraints. We have also provided a theoretical proof of the proposed method's performance guarantee bound, which ensures that P2BPO can learn a policy satisfying the safety constraints with high probability while achieving a higher expected reward. Furthermore, we compare P2BPO with other SRL algorithms on various SRL tasks and demonstrate that it achieves better rewards while adhering to the constraints.

Published

2024-03-24

How to Cite

Dey, S., Dasgupta, P., & Dey, S. (2024). P2BPO: Permeable Penalty Barrier-Based Policy Optimization for Safe RL. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 21029-21036. https://doi.org/10.1609/aaai.v38i19.30094

Issue

Section

AAAI Technical Track on Safe, Robust and Responsible AI Track