TY - JOUR
AU - Wei, Honghao
AU - Liu, Xin
AU - Ying, Lei
PY - 2022/06/28
Y2 - 2024/06/23
TI - A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes
JF - Proceedings of the AAAI Conference on Artificial Intelligence
JA - AAAI
VL - 36
IS - 4
SE - AAAI Technical Track on Constraint Satisfaction and Optimization
DO - 10.1609/aaai.v36i4.20302
UR - https://ojs.aaai.org/index.php/AAAI/article/view/20302
SP - 3868-3876
AB - This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon average-reward Constrained Markov Decision Processes (CMDPs). Considering a learning horizon K, which is sufficiently large, the proposed algorithm achieves sublinear regret and zero constraint violation. The bounds depend on the number of states S, the number of actions A, and two constants which are independent of the learning horizon K.
ER -