Wang, Chenglong, Yongyu Mu, Hang Zhou, Yifu Huo, Ziming Zhu, Jiali Zeng, Murun Yang, et al. 2026. “GRAM-R²: Self-Training Generative Foundation Reward Models for Reward Reasoning”. Proceedings of the AAAI Conference on Artificial Intelligence 40 (39):33395-403. https://doi.org/10.1609/aaai.v40i39.40626.