Knox, W. B., S. Hatgis-Kessell, S. O. Adalgeirsson, S. Booth, A. Dragan, P. Stone, and S. Niekum. “Learning Optimal Advantage from Preferences and Mistaking It for Reward”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 9, Mar. 2024, pp. 10066-73, doi:10.1609/aaai.v38i9.28870.