TIV: Thought Injection via Vectors for Efficient Reasoning in Large Reasoning Models

Authors

  • Yi Cao School of Computer Science and Technology, Soochow University Key Laboratory of Data Intelligence and Advanced Computing, Soochow University
  • Weijie Shi Department of Computer Science and Engineering, The Hong Kong University of Science and Technology
  • Wei-Jie Xu School of Artificial Intelligence, Nanjing University
  • Yucheng Shen School of Computer Science and Technology, Soochow University
  • Yue Cui Alibaba Group
  • Hanghui Guo Zhejiang Key Laboratory of Intelligent Education Technology and Application, Zhejiang Normal University
  • Shimin Di School of Computer Science and Engineering, Southeast University
  • Ziyi Liu Department of Computer Science and Engineering, The Hong Kong University of Science and Technology
  • Jiaming Li ByteDance
  • Alexander Zhou Department of Computing, The Hong Kong Polytechnic University
  • Jia Zhu Zhejiang Key Laboratory of Intelligent Education Technology and Application, Zhejiang Normal University
  • Jiajie Xu School of Computer Science and Technology, Soochow University Key Laboratory of Data Intelligence and Advanced Computing, Soochow University

DOI:

https://doi.org/10.1609/aaai.v40i36.40264

Abstract

Large Reasoning Models (LRMs) have recently demonstrated impressive performance across a range of reasoning tasks by generating intermediate thoughts. However, these models can suffer from overthinking—generating excessive tokens that contribute little to final accuracy while increasing inference cost. To mitigate this, we propose TIV (Thought Injection via Vectors), an innovative framework that compresses token-level reasoning into compact vectors without sacrificing performance. Rather than generating explicit thoughts, TIV injects learnable vectors into the post-attention hidden states of the final token across Transformer layers, enabling implicit and lightweight reasoning. We further introduce a two-stage reinforcement learning strategy: the first stage calibrates the model's reasoning distribution, and the second distills it into a vector-based policy optimized for both accuracy and brevity. Experiments on three reasoning benchmarks show that TIV preserves over 99% of the original accuracy while reducing output length by more than 65% on average, reaching up to 80% in some cases. Moreover, TIV consistently achieves superior trade-offs between accuracy and efficiency compared to existing methods, distinguishing itself as a state-of-the-art (SOTA) approach for efficient reasoning in LRMs.

Downloads

Published

2026-03-14

How to Cite

Cao, Y., Shi, W., Xu, W.-J., Shen, Y., Cui, Y., Guo, H., … Xu, J. (2026). TIV: Thought Injection via Vectors for Efficient Reasoning in Large Reasoning Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(36), 30148–30155. https://doi.org/10.1609/aaai.v40i36.40264

Issue

Section

AAAI Technical Track on Natural Language Processing I