[1]

T. Xiao and S. Wang, “Towards Off-Policy Learning for Ranking Policies with Logged Feedback”, AAAI, vol. 36, no. 8, pp. 8700–8707, Jun. 2022.