Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning

Li Wang; Changhao Zhang; Zengqi Xiu; Kai Lu; Xin Yu; Kui Zhang; Wenjun Wu

doi:10.1609/aaai.v40i39.40646

Authors

Li Wang Beihang University, Beijing, China
Changhao Zhang UCL Hawkes Institute and Department of Medical Physics and Biomedical Engineering, University College London, UK
Zengqi Xiu Beihang University, Beijing, China
Kai Lu State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Xin Yu Beihang University, Beijing, China
Kui Zhang Beihang University, Beijing, China
Wenjun Wu Beihang University, Beijing, China Hangzhou International Innovation Institute, Beihang University, Hangzhou, China Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i39.40646

Abstract

Despite recent advances in the reasoning capabilities of Large Language Models (LLMs), improving the reasoning ability of Small Language Models (SLMs, e.g., up to 1.5B parameters) remains challenging. A key obstacle lies in the complexity and variability of natural language: essentially equivalent problems often appear in diverse surface forms, often obscured by redundant or distracting details. This imposes a dual burden on SLMs: they must first extract the core problem from complex linguistic input, and then perform reasoning based on that understanding. The resulting vast and noisy problem space hinders optimization, particularly for models with limited capacity. To address this, we propose a new framework that decouples understanding from reasoning by mapping natural language problems into a canonical problem space-a semantically simplified yet expressive domain. This enables SLMs to focus on reasoning over standardized inputs, free from linguistic variability. Within this framework, we introduce DURIT (Decoupled Understanding from Reasoning via Iterative Training), a three-step algorithm that iteratively: (1) mapping natural language problems via reinforcement learning, (2) aligns reasoning trajectories through self-distillation, and (3) trains reasoning policies in the problem space. The mapper and reasoner are co-trained in an alternating loop throughout this process. Experiments show that DURIT substantially improves SLMs' performance on both in-domain and out-of-domain mathematical and logical reasoning tasks. Beyond improving reasoning capabilities, DURIT also improves the robustness of reasoning, validating decoupling understanding from reasoning as an effective strategy for strengthening SLMs.

Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information