[1]

C. Wang, “GRAM-R²: Self-Training Generative Foundation Reward Models for Reward Reasoning”, AAAI, vol. 40, no. 39, pp. 33395–33403, Mar. 2026.