Mandal, D., Radanovic, G., Gan, J., Singla, A., & Majumdar, R. (2023). Online Reinforcement Learning with Uncertain Episode Lengths. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7), 9064-9071. https://doi.org/10.1609/aaai.v37i7.26088