(1)
Li, X.; Li, X.; Hu, S.; Guo, Y.; Zhang, W. VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains. AAAI 2026, 40, 31796-31804.