Knox, W. Bradley, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, and Scott Niekum. “Learning Optimal Advantage from Preferences and Mistaking It for Reward”. Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 9 (March 24, 2024): 10066-10073. Accessed November 22, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/28870.