(1)
Wang, C.; Mu, Y.; Zhou, H.; Huo, Y.; Zhu, Z.; Zeng, J.; Yang, M.; Li, B.; Hao, X.; Zhang, C. GRAM-R²: Self-Training Generative Foundation Reward Models for Reward Reasoning. AAAI 2026, 40, 33395-33403.