(1)
Bai, Q.; Mondal, W. U.; Aggarwal, V. Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes. AAAI 2024, 38, 10980-10988.