(1)
Zhou, J.; Ji, J.; Dai, J.; Yang, Y. Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback. AAAI 2025, 39, 27765-27773.