Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs

Xikang Yang; Biyu Zhou; Xuehai Tang; Jizhong Han; Songlin Hu

doi:10.1609/aaai.v40i3.37203

Authors

Xikang Yang Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Biyu Zhou Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Xuehai Tang Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Jizhong Han Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Songlin Hu Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i3.37203

Abstract

Large Language Models (LLMs) demonstrate impressive capabilities across diverse tasks, yet their safety mechanisms remain susceptible to adversarial exploitation of cognitive biases---systematic deviations from rational judgment. Unlike prior studies focusing on isolated biases, this work highlights the overlooked power of multi-bias interactions in undermining LLM safeguards. Specifically, we propose CognitiveAttack, a novel red-teaming framework that adaptively selects optimal ensembles from 154 human social psychology-defined cognitive biases, engineering them into adversarial prompts to effectively compromise LLM safety mechanisms. Experimental results reveal systemic vulnerabilities across 30 mainstream LLMs, particularly open-source variants. CognitiveAttack achieves a substantially higher attack success rate than the SOTA black-box method PAP (60.1% vs. 31.6%), exposing critical limitations in current defenses. Through quantitative analysis of successful jailbreaks, we further identify vulnerability patterns in safety-aligned LLMs under synergistic cognitive biases, validating multi-bias interactions as a potent yet underexplored attack vector. This work introduces a novel interdisciplinary perspective by bridging cognitive science and LLM safety, paving the way for more robust and human-aligned AI systems.

Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information