(1)

Lopez, R.; Dhillon, I. S.; Jordan, M. I. Learning from EXtreme Bandit Feedback. AAAI 2021, 35, 8732-8740.