TIV: Thought Injection via Vectors for Efficient Reasoning in Large Reasoning Models

Yi Cao; Weijie Shi; Wei-Jie Xu; Yucheng Shen; Yue Cui; Hanghui Guo; Shimin Di; Ziyi Liu; Jiaming Li; Alexander Zhou; Jia Zhu; Jiajie Xu

doi:10.1609/aaai.v40i36.40264

Authors

Yi Cao School of Computer Science and Technology, Soochow University Key Laboratory of Data Intelligence and Advanced Computing, Soochow University
Weijie Shi Department of Computer Science and Engineering, The Hong Kong University of Science and Technology
Wei-Jie Xu School of Artificial Intelligence, Nanjing University
Yucheng Shen School of Computer Science and Technology, Soochow University
Yue Cui Alibaba Group
Hanghui Guo Zhejiang Key Laboratory of Intelligent Education Technology and Application, Zhejiang Normal University
Shimin Di School of Computer Science and Engineering, Southeast University
Ziyi Liu Department of Computer Science and Engineering, The Hong Kong University of Science and Technology
Jiaming Li ByteDance
Alexander Zhou Department of Computing, The Hong Kong Polytechnic University
Jia Zhu Zhejiang Key Laboratory of Intelligent Education Technology and Application, Zhejiang Normal University
Jiajie Xu School of Computer Science and Technology, Soochow University Key Laboratory of Data Intelligence and Advanced Computing, Soochow University

DOI:

https://doi.org/10.1609/aaai.v40i36.40264

Abstract

Large Reasoning Models (LRMs) have recently demonstrated impressive performance across a range of reasoning tasks by generating intermediate thoughts. However, these models can suffer from overthinking—generating excessive tokens that contribute little to final accuracy while increasing inference cost. To mitigate this, we propose TIV (Thought Injection via Vectors), an innovative framework that compresses token-level reasoning into compact vectors without sacrificing performance. Rather than generating explicit thoughts, TIV injects learnable vectors into the post-attention hidden states of the final token across Transformer layers, enabling implicit and lightweight reasoning. We further introduce a two-stage reinforcement learning strategy: the first stage calibrates the model's reasoning distribution, and the second distills it into a vector-based policy optimized for both accuracy and brevity. Experiments on three reasoning benchmarks show that TIV preserves over 99% of the original accuracy while reducing output length by more than 65% on average, reaching up to 80% in some cases. Moreover, TIV consistently achieves superior trade-offs between accuracy and efficiency compared to existing methods, distinguishing itself as a state-of-the-art (SOTA) approach for efficient reasoning in LRMs.

TIV: Thought Injection via Vectors for Efficient Reasoning in Large Reasoning Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information