Wang, C., Mu, Y., Zhou, H., Huo, Y., Zhu, Z., Zeng, J., … Xiao, T. (2026). GRAM-R²: Self-Training Generative Foundation Reward Models for Reward Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33395–33403. https://doi.org/10.1609/aaai.v40i39.40626