Policy-Based Primal-Dual Methods for Convex Constrained Markov Decision Processes

Authors

  • Donghao Ying UC Berkeley, Department of Industrial Engineering and Operations Research
  • Mengzi Amy Guo UC Berkeley, Department of Industrial Engineering and Operations Research
  • Yuhao Ding UC Berkeley, Department of Industrial Engineering and Operations Research
  • Javad Lavaei UC Berkeley, Department of Industrial Engineering and Operations Research
  • Zuo-Jun Shen UC Berkeley, Department of Industrial Engineering and Operations Research

DOI:

https://doi.org/10.1609/aaai.v37i9.26299

Keywords:

ML: Reinforcement Learning Theory, ML: Learning Theory, ML: Optimization, ML: Reinforcement Learning Algorithms

Abstract

We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and the constraints are convex in the state-action occupancy measure. We propose a policy-based primal-dual algorithm that updates the primal variable via policy gradient ascent and updates the dual variable via projected sub-gradient descent. Despite the loss of additivity structure and the nonconvex nature, we establish the global convergence of the proposed algorithm by leveraging a hidden convexity in the problem, and prove the O(T^-1/3) convergence rate in terms of both optimality gap and constraint violation. When the objective is strongly concave in the occupancy measure, we prove an improved convergence rate of O(T^-1/2). By introducing a pessimistic term to the constraint, we further show that a zero constraint violation can be achieved while preserving the same convergence rate for the optimality gap. This work is the first one in the literature that establishes non-asymptotic convergence guarantees for policy-based primal-dual methods for solving infinite-horizon discounted convex CMDPs.

Downloads

Published

2023-06-26

How to Cite

Ying, D., Guo, M. A., Ding, Y., Lavaei, J., & Shen, Z.-J. (2023). Policy-Based Primal-Dual Methods for Convex Constrained Markov Decision Processes. Proceedings of the AAAI Conference on Artificial Intelligence, 37(9), 10963-10971. https://doi.org/10.1609/aaai.v37i9.26299

Issue

Section

AAAI Technical Track on Machine Learning IV