BAI, Qinbo; MONDAL, Washim Uddin; AGGARWAL, Vaneet. Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 38, n. 10, p. 10980–10988, 2024. DOI: 10.1609/aaai.v38i10.28973. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/28973. Acesso em: 24 jul. 2026.