Wang, Chenglong, Yongyu Mu, Hang Zhou, Yifu Huo, Ziming Zhu, Jiali Zeng, Murun Yang, et al. “GRAM-R²: Self-Training Generative Foundation Reward Models for Reward Reasoning”. Proceedings of the AAAI Conference on Artificial Intelligence 40, no. 39 (March 14, 2026): 33395–33403. Accessed May 15, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/40626.