[1]

H. Wang, “Efficient and Robust Reinforcement Learning from Human Feedback”, AAAI, vol. 39, no. 27, pp. 28730–28730, Apr. 2025.