Beyond Step Pruning: Information Theory Based Step-level Optimization for Self-Refining Large Language Models

Jinman Zhao; Erxue Min; Hui Wu; Ziheng Li; Zexu Sun; Hengyi Cai; Shuaiqiang Wang; Xu Chen; Gerald Penn

doi:10.1609/aaai.v40i41.40798

Authors

Jinman Zhao Department of Computer Science, University of Toronto
Erxue Min Baidu Inc
Hui Wu Aerospace Information Research Institute, Chinese Academy of Sciences
Ziheng Li Peking University
Zexu Sun Baidu Inc
Hengyi Cai Baidu Inc
Shuaiqiang Wang Baidu Inc
Xu Chen Renmin University of China
Gerald Penn Department of Computer Science, University of Toronto

DOI:

https://doi.org/10.1609/aaai.v40i41.40798

Abstract

Large language models (LLMs) have shown impressive capabilities in natural language tasks, yet they continue to struggle with multi-step mathematical reasoning, where correctness depends on a precise chain of intermediate steps. Preference optimization methods such as Direct Preference Optimization (DPO) have improved answer-level alignment, but they often overlook the reasoning process itself, providing little supervision over intermediate steps that are critical for complex problem-solving. Existing fine-grained approaches typically rely on strong annotators or reward models to assess the quality of individual steps. However, reward models are vulnerable to reward hacking. To address this, we propose ISLA, a reward-model-free framework that constructs step-level preference data directly from SFT gold traces. ISLA also introduces a self-improving pruning mechanism that identifies informative steps based on two signals: their marginal contribution to final accuracy (relative accuracy) and the model’s uncertainty, inspired by the concept of information gain. Empirically, ISLA achieves better performance than DPO while using only 12% of the training tokens, demonstrating that careful step-level selection can significantly improve both reasoning accuracy and training efficiency.

Beyond Step Pruning: Information Theory Based Step-level Optimization for Self-Refining Large Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information