Lopez, R., Dhillon, I. S., & Jordan, M. I. (2021). Learning from eXtreme Bandit Feedback. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8732-8740. https://doi.org/10.1609/aaai.v35i10.17058