Ye, Y., Zhou, X., Chen, Z., Li, D., Gu, H., Zhou, J. P., & Zhou, D. (2026). K-12EduBench: A Benchmark for Evaluating Large Language Models’ Knowledge, Problem-Solving, and Educational Goal Cognition in K-12 Education. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 34459–34466. https://doi.org/10.1609/aaai.v40i40.40744