Mandal, D., G. Radanovic, J. Gan, A. Singla, and R. Majumdar. “Online Reinforcement Learning With Uncertain Episode Lengths”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 7, June 2023, pp. 9064-71, doi:10.1609/aaai.v37i7.26088.