Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints

Authors

  • Yuhao Ding University of California - Berkeley
  • Javad Lavaei University of California - Berkeley

DOI:

https://doi.org/10.1609/aaai.v37i6.25900

Keywords:

ML: Reinforcement Learning Theory, ML: Reinforcement Learning Algorithms

Abstract

We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments. In this problem, the reward/utility functions and the state transition functions are both allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain known variation budgets. Designing safe RL algorithms in time-varying environments is particularly challenging because of the need to integrate the constraint violation reduction, safe exploration, and adaptation to the non-stationarity. To this end, we identify two alternative conditions on the time-varying constraints under which we can guarantee the safety in the long run. We also propose the Periodically Restarted Optimistic Primal-Dual Proximal Policy Optimization (PROPD-PPO) algorithm that can coordinate with both two conditions. Furthermore, a dynamic regret bound and a constraint violation bound are established for the proposed algorithm in both the linear kernel CMDP function approximation setting and the tabular CMDP setting under two alternative conditions. This paper provides the first provably efficient algorithm for non-stationary CMDPs with safe exploration.

Downloads

Published

2023-06-26

How to Cite

Ding, Y., & Lavaei, J. (2023). Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 7396-7404. https://doi.org/10.1609/aaai.v37i6.25900

Issue

Section

AAAI Technical Track on Machine Learning I