[1]

C. Tian, “Rectify Evaluation Preference: Improving LLMs’ Critique on Math Reasoning via Perplexity-aware Reinforcement Learning”, AAAI, vol. 40, no. 39, pp. 33241–33249, Mar. 2026.