[1]

R. Lopez, I. S. Dhillon, and M. I. Jordan, “Learning from eXtreme Bandit Feedback”, AAAI, vol. 35, no. 10, pp. 8732–8740, May 2021.