Mandal, Debmalya, Goran Radanovic, Jiarui Gan, Adish Singla, and Rupak Majumdar. “Online Reinforcement Learning With Uncertain Episode Lengths”. Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (June 26, 2023): 9064-9071. Accessed July 12, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/26088.