Li, X. (2026) “VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains”, Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), pp. 31796–31804. doi: 10.1609/aaai.v40i38.40448.