Lopez, R., I. S. Dhillon, and M. I. Jordan. “Learning from EXtreme Bandit Feedback”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 10, May 2021, pp. 8732-40, doi:10.1609/aaai.v35i10.17058.