[1]

Y. Qin, Y. Li, F. Pasqualetti, M. Fazel, and S. Oymak, “Stochastic Contextual Bandits with Long Horizon Rewards”, AAAI, vol. 37, no. 8, pp. 9525–9533, Jun. 2023.