[1]

G. Xiong, J. Li, and R. Singh, “Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits”, AAAI, vol. 36, no. 8, pp. 8726-8734, Jun. 2022.