[1]
R. Lopez, I. S. Dhillon, and M. I. Jordan, “Learning from eXtreme Bandit Feedback”, AAAI, vol. 35, no. 10, pp. 8732-8740, May 2021.