[1]
K. Metcalf, M. Sarabia, M. Fedzechkina, and B.-J. Theobald, “Can You Rely on Synthetic Labellers in Preference-Based Reinforcement Learning? It’s Complicated”, AAAI, vol. 38, no. 9, pp. 10128–10136, Mar. 2024.