Xiao, Teng, and Suhang Wang. “Towards Off-Policy Learning for Ranking Policies With Logged Feedback”. Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8700–8707. Accessed May 7, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/20849.