Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning

Authors

  • Li Wang Beihang University, Beijing, China
  • Changhao Zhang UCL Hawkes Institute and Department of Medical Physics and Biomedical Engineering, University College London, UK
  • Zengqi Xiu Beihang University, Beijing, China
  • Kai Lu State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Xin Yu Beihang University, Beijing, China
  • Kui Zhang Beihang University, Beijing, China
  • Wenjun Wu Beihang University, Beijing, China Hangzhou International Innovation Institute, Beihang University, Hangzhou, China Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i39.40646

Abstract

Despite recent advances in the reasoning capabilities of Large Language Models (LLMs), improving the reasoning ability of Small Language Models (SLMs, e.g., up to 1.5B parameters) remains challenging. A key obstacle lies in the complexity and variability of natural language: essentially equivalent problems often appear in diverse surface forms, often obscured by redundant or distracting details. This imposes a dual burden on SLMs: they must first extract the core problem from complex linguistic input, and then perform reasoning based on that understanding. The resulting vast and noisy problem space hinders optimization, particularly for models with limited capacity. To address this, we propose a new framework that decouples understanding from reasoning by mapping natural language problems into a canonical problem space-a semantically simplified yet expressive domain. This enables SLMs to focus on reasoning over standardized inputs, free from linguistic variability. Within this framework, we introduce DURIT (Decoupled Understanding from Reasoning via Iterative Training), a three-step algorithm that iteratively: (1) mapping natural language problems via reinforcement learning, (2) aligns reasoning trajectories through self-distillation, and (3) trains reasoning policies in the problem space. The mapper and reasoner are co-trained in an alternating loop throughout this process. Experiments show that DURIT substantially improves SLMs' performance on both in-domain and out-of-domain mathematical and logical reasoning tasks. Beyond improving reasoning capabilities, DURIT also improves the robustness of reasoning, validating decoupling understanding from reasoning as an effective strategy for strengthening SLMs.

Published

2026-03-14

How to Cite

Wang, L., Zhang, C., Xiu, Z., Lu, K., Yu, X., Zhang, K., & Wu, W. (2026). Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33575–33583. https://doi.org/10.1609/aaai.v40i39.40646

Issue

Section

AAAI Technical Track on Natural Language Processing IV