Xia, S., Li, X., Liu, Y., Wu, T., & Liu, P. (2025). Evaluating Mathematical Reasoning Beyond Accuracy. Proceedings of the AAAI Conference on Artificial Intelligence, 39(26), 27723–27730. https://doi.org/10.1609/aaai.v39i26.34987