(1)
Wang, A.; Ou, L.; Yu, Y.; Zhang, Z. Reward Model Evaluation via Automatically-Ranked Policy Alignment. AAAI 2026, 40, 26124-26132.