Qin, Yuzhen, et al. “Stochastic Contextual Bandits With Long Horizon Rewards”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 8, June 2023, pp. 9525-33, doi:10.1609/aaai.v37i8.26140.