VCR: A “Cone of Experience” Driven Synthetic Data Generation Framework for Mathematical Reasoning

Authors

  • Sannyuya Liu Central China Normal University
  • Jintian Feng Central China Normal University
  • Xiaoxuan Shen Central China Normal University
  • Shengyingjie Liu Central China Normal University
  • Qian Wan Central China Normal University
  • Jianwen Sun Central China Normal University

DOI:

https://doi.org/10.1609/aaai.v39i23.34645

Abstract

Large language models (LLMs) have shown excellent performance in natural language processing but struggle with mathematical reasoning. As the training mode gradually solidifies, researchers propose a data-centric concept of artificial intelligence, emphasizing the development of higher-quality data to empower LLMs. Existing studies construct synthetic data for mathematical reasoning by expanding public datasets, thereby performing supervised fine-tuning of LLMs. However, these methods mostly focus on quantity while neglecting quality. The challenging samples fail to receive adequate consideration during data synthesis process, resulting in high construction costs, low-quality density, and serious data homogenization. This paper proposes a multi-agent environment called Virtual ClassRoom (VCR), which leverages various agents driven by LLM to construct high-quality diversified synthetic data. Inspired by the "Cone of Experience" educational theory, VCR introduces three experience levels (direct, iconic, and symbolic) into data synthesis process by analogy with human learning. A user-friendly instruction set and role-playing system are carefully designed, enabling VCR to autonomously plan the scale of synthetic data. This system covers various educational scenarios, including lecture, discussion, problem design and problem-solving. The Adaboost idea embodied in the global iterative process further promotes steady performance improvement. Extensive experiments show that the synthetic data generated by VCR possess higher quality density and generalization capability, which can give LLMs superior mathematical reasoning performance with the same scale.

Downloads

Published

2025-04-11

How to Cite

Liu, S., Feng, J., Shen, X., Liu, S., Wan, Q., & Sun, J. (2025). VCR: A “Cone of Experience” Driven Synthetic Data Generation Framework for Mathematical Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 39(23), 24650–24658. https://doi.org/10.1609/aaai.v39i23.34645

Issue

Section

AAAI Technical Track on Natural Language Processing II