Li, Xuzhao, Xuchen Li, Shiyu Hu, Yongzhen Guo, and Wentao Zhang. “VerifyBench: A Systematic Benchmark for Evaluating Reasoning Verifiers Across Domains”. Proceedings of the AAAI Conference on Artificial Intelligence 40, no. 38 (March 14, 2026): 31796–31804. Accessed May 19, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/40448.