AutoCost: Evolving Intrinsic Cost for Zero-Violation Reinforcement Learning

Authors

  • Tairan He Carnegie Mellon University
  • Weiye Zhao Carnegie Mellon University
  • Changliu Liu Carnegie Mellon University

DOI:

https://doi.org/10.1609/aaai.v37i12.26734

Keywords:

General

Abstract

Safety is a critical hurdle that limits the application of deep reinforcement learning to real-world control tasks. To this end, constrained reinforcement learning leverages cost functions to improve safety in constrained Markov decision process. However, constrained methods fail to achieve zero violation even when the cost limit is zero. This paper analyzes the reason for such failure, which suggests that a proper cost function plays an important role in constrained RL. Inspired by the analysis, we propose AutoCost, a simple yet effective framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance. We validate the proposed method and the searched cost function on the safety benchmark Safety Gym. We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs to a Lagrangian-based policy learner and a constrained-optimization policy learner with baseline agents that use the same policy learners but with only extrinsic costs. Results show that the converged policies with intrinsic costs in all environments achieve zero constraint violation and comparable performance with baselines.

Downloads

Published

2023-06-26

How to Cite

He, T., Zhao, W., & Liu, C. (2023). AutoCost: Evolving Intrinsic Cost for Zero-Violation Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(12), 14847-14855. https://doi.org/10.1609/aaai.v37i12.26734

Issue

Section

AAAI Special Track on Safe and Robust AI