BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Shuaitong Liu; Renjue Li; Lijia Yu; Lijun Zhang; Zhiming Liu; Gaojie Jin

doi:10.1609/aaai.v40i38.40486

Authors

Shuaitong Liu College of Computer and Information Science, Software College, Southwest University, Chongqing, China
Renjue Li Institute of AI for Industries, Chinese Academy of Sciences, Nanjing, China
Lijia Yu Institute of AI for Industries, Chinese Academy of Sciences, Nanjing, China
Lijun Zhang Institute of Software, Chinese Academy of Sciences, Beijing, China
Zhiming Liu College of Computer and Information Science, Software College, Southwest University, Chongqing, China
Gaojie Jin Department of Computer Science, University of Exeter, UK

DOI:

https://doi.org/10.1609/aaai.v40i38.40486

Abstract

Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of large language models (LLMs), but have also introduced their computational efficiency as a new attack surface. In this paper, we propose BadThink, the first backdoor attack designed to deliberately induce "overthinking" behavior in CoT-enabled LLMs while ensuring stealth. When activated by carefully crafted trigger prompts, BadThink manipulates the model to generate inflated reasoning traces—producing unnecessarily redundant thought processes while preserving the consistency of final outputs. This subtle attack vector creates a covert form of performance degradation that significantly increases computational costs and inference time while remaining difficult to detect through conventional output evaluation methods. We implement this attack through a sophisticated poisoning-based fine-tuning strategy, employing a novel LLM-based iterative optimization process to embed the behavior by generating highly naturalistic poisoned data. Our experiments on multiple state-of-the-art models and reasoning tasks show that BadThink consistently increases reasoning trace lengths—achieving an over 17× increase on the MATH-500 dataset—while remaining stealthy and robust. This work reveals a critical, previously unexplored vulnerability where reasoning efficiency can be covertly manipulated, demonstrating a new class of sophisticated attacks against CoT-enabled systems.

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information