Efficient Thought Space Exploration Through Strategic Intervention

Ziheng Li; Hengyi Cai; Xiaochi Wei; Yuchen Li; Shuaiqiang Wang; Zhi-Hong Deng; Dawei Yin

doi:10.1609/aaai.v40i38.40459

Authors

Ziheng Li State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University
Hengyi Cai Baidu Inc.
Xiaochi Wei Baidu Inc.
Yuchen Li Baidu Inc.
Shuaiqiang Wang Baidu Inc.
Zhi-Hong Deng State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University
Dawei Yin Baidu Inc.

DOI:

https://doi.org/10.1609/aaai.v40i38.40459

Abstract

While large language models (LLMs) demonstrate emerging reasoning capabilities, current inference-time expansion methods incur prohibitive computational costs through exhaustive sampling. Through analyzing decoding trajectories, we observe that most next-token predictions align well with the golden output, except for a few critical tokens that lead to deviations. Inspired by this phenomenon, we propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components: 1) a hinter (powerful LLM) that provides probabilistic guidance at critical decision points, and 2) a practitioner (efficient smaller model) that executes major reasoning steps. The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), a theoretically-grounded metric that dynamically identifies intervention points by quantifying the divergence between practitioner's reasoning trajectory and hinter's expected distribution in a tree-structured probabilistic space. Through iterative tree updates guided by DIR, HPR reweights promising reasoning paths while deprioritizing low-probability branches. Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs: it achieves comparable performance to self-consistency and MCTS baselines while decoding only 1/5 tokens, and outperforms existing methods by at most 5.1% absolute accuracy while maintaining similar or lower FLOPs.

Efficient Thought Space Exploration Through Strategic Intervention

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information