(1)
Tian, C.; Lu, Z.; Qian, S.; Liu, N.; Li, P.; Jin, L.; Hu, L.; Zeng, Z.; Wang, S.; Zeng, K. Rectify Evaluation Preference: Improving LLMs’ Critique on Math Reasoning via Perplexity-Aware Reinforcement Learning. AAAI 2026, 40, 33241-33249.