A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes

Honghao Wei; Xin Liu; Lei Ying

doi:10.1609/aaai.v36i4.20302

A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes

Authors

Honghao Wei University of Michigan
Xin Liu ShanghaiTech University
Lei Ying University of Michigan

DOI:

https://doi.org/10.1609/aaai.v36i4.20302

Keywords:

Constraint Satisfaction And Optimization (CSO)

Abstract

This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon average-reward Constrained Markov Decision Processes (CMDPs). Considering a learning horizon K, which is sufficiently large, the proposed algorithm achieves sublinear regret and zero constraint violation. The bounds depend on the number of states S, the number of actions A, and two constants which are independent of the learning horizon K.

Downloads

Published

2022-06-28

How to Cite

Wei, H., Liu, X., & Ying, L. (2022). A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4), 3868-3876. https://doi.org/10.1609/aaai.v36i4.20302

Download Citation

Issue

Vol. 36 No. 4: AAAI-22 Technical Tracks 4

Section

AAAI Technical Track on Constraint Satisfaction and Optimization

A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription