BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Authors

  • Shuaitong Liu College of Computer and Information Science, Software College, Southwest University, Chongqing, China
  • Renjue Li Institute of AI for Industries, Chinese Academy of Sciences, Nanjing, China
  • Lijia Yu Institute of AI for Industries, Chinese Academy of Sciences, Nanjing, China
  • Lijun Zhang Institute of Software, Chinese Academy of Sciences, Beijing, China
  • Zhiming Liu College of Computer and Information Science, Software College, Southwest University, Chongqing, China
  • Gaojie Jin Department of Computer Science, University of Exeter, UK

DOI:

https://doi.org/10.1609/aaai.v40i38.40486

Abstract

Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of large language models (LLMs), but have also introduced their computational efficiency as a new attack surface. In this paper, we propose BadThink, the first backdoor attack designed to deliberately induce "overthinking" behavior in CoT-enabled LLMs while ensuring stealth. When activated by carefully crafted trigger prompts, BadThink manipulates the model to generate inflated reasoning traces—producing unnecessarily redundant thought processes while preserving the consistency of final outputs. This subtle attack vector creates a covert form of performance degradation that significantly increases computational costs and inference time while remaining difficult to detect through conventional output evaluation methods. We implement this attack through a sophisticated poisoning-based fine-tuning strategy, employing a novel LLM-based iterative optimization process to embed the behavior by generating highly naturalistic poisoned data. Our experiments on multiple state-of-the-art models and reasoning tasks show that BadThink consistently increases reasoning trace lengths—achieving an over 17× increase on the MATH-500 dataset—while remaining stealthy and robust. This work reveals a critical, previously unexplored vulnerability where reasoning efficiency can be covertly manipulated, demonstrating a new class of sophisticated attacks against CoT-enabled systems.

Downloads

Published

2026-03-14

How to Cite

Liu, S., Li, R., Yu, L., Zhang, L., Liu, Z., & Jin, G. (2026). BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 32141–32149. https://doi.org/10.1609/aaai.v40i38.40486

Issue

Section

AAAI Technical Track on Natural Language Processing III