Lopez, Romain, Inderjit S. Dhillon, and Michael I. Jordan. 2021. “Learning from EXtreme Bandit Feedback”. Proceedings of the AAAI Conference on Artificial Intelligence 35 (10):8732-40. https://doi.org/10.1609/aaai.v35i10.17058.