[1]
A. Wang, L. Ou, Y. Yu, and Z. Zhang, “Reward Model Evaluation via Automatically-Ranked Policy Alignment”, AAAI, vol. 40, no. 31, pp. 26124–26132, Mar. 2026.