[1]

G. Chen, S. C. Liew, and D. Gündüz, “GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits”, AAAI, vol. 40, no. 24, pp. 20032-20040, Mar. 2026.