Wang, C. (2026) “GRAM-R²: Self-Training Generative Foundation Reward Models for Reward Reasoning”, Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), pp. 33395–33403. doi: 10.1609/aaai.v40i39.40626.