Easy for Children, Hard for AI: The Limits of Multimodal LLMs in Early Childhood Learning

Jingping Liu; Xueyan Wu; Hanxuan Chen; Ziyan Liu; Zhangquan Chen; Ronghao Chen; Huacan Wang

doi:10.1609/aaai.v40i38.40479

Authors

Jingping Liu Sun Yat-sen University
Xueyan Wu East China University of Science and Technology
Hanxuan Chen Hunan University
Ziyan Liu East China University of Science and Technology
Zhangquan Chen Tsinghua University
Ronghao Chen Peking University
Huacan Wang University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i38.40479

Abstract

Early childhood is a critical stage for cognitive development, involving core skills such as visual perception and reasoning. While multimodal large language models (MLLMs) have made rapid progress in various general-purpose tasks, their ability to support early education remains largely underexplored. Existing research on child-related AI largely centers on modeling language, emotion, or behavior, with limited focus on evaluating cognitive tasks relevant to early learning. To address this gap, we propose ChildBench, a multimodal benchmark designed to assess models on tasks inspired by early childhood cognitive development. It covers five key domains through ten tasks, including spatial reasoning, visual reasoning, visual discrimination, counting skills, and visual tracking. The benchmark includes 4,890 carefully constructed images and 5,346 manually annotated samples, ensuring both diversity and age-appropriate content. We evaluate a range of state-of-the-art (SoTA) open-source and closed-source MLLMs—including GPT-4o, Gemini, and Qwen2.5-VL—on ChildBench. Despite strong performance on other benchmarks, the best 7B-parameter model with LoRA tuning achieves only 52.01% accuracy, far below the 96% achieved by 5-year-old children. These results reveal critical limitations in fine-grained perception and reasoning. We further analyze failure cases and discuss directions for future model development.

Easy for Children, Hard for AI: The Limits of Multimodal LLMs in Early Childhood Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information