TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models

Hao Yu; Ke Liang; Junxian Duan; Jun Wang; Siwei Wang; Chuan Ma; Xinwang Liu

doi:10.1609/aaai.v40i33.40018

Authors

Hao Yu National University of Defense Technology
Ke Liang National University of Defense Technology
Junxian Duan Institute of automation, Chinese academy of science, Chinese Academy of Sciences
Jun Wang National University of Defense Technology
Siwei Wang Intelligent Game and Decision Lab
Chuan Ma Chongqing University
Xinwang Liu National University of Defense Technology

DOI:

https://doi.org/10.1609/aaai.v40i33.40018

Abstract

Large Vision-Language Models (LVLMs) enhance the capabilities of Large Language Models by integrating visual inputs, thereby enabling advanced multimodal reasoning across diverse applications. However, these enhanced reasoning capabilities introduce new security risks, particularly to jailbreaking attacks that bypass built-in safety mechanisms to elicit harmful or unauthorized outputs. While recent efforts have explored adversarial and typographic prompts, most existing attacks suffer from three key limitations: reliance on auxiliary models, limited effectiveness in black-box scenarios, and inadequate exploitation of the LVLMs' intrinsic reasoning abilities. In this work, we propose TVChain, a novel black-box jailbreaking framework that explicitly intervenes in both the visual and textual reasoning processes of LVLMs. TVChain decomposes malicious prompts into a sequence of semantically meaningful sub-images that represent relevant objects and behaviors, thereby circumventing direct exposure of illicit content. In parallel, a carefully designed chain-of-thought (CoT) textual prompt is employed to steer the model's reasoning toward reconstructing the intended activity in a covert yet effective manner. We demonstrate that this compositional prompting strategy reduces the likelihood of triggering safety mechanisms while preserving attack efficacy. Extensive evaluations on eleven LVLMs (seven open-source and four commercial) across two benchmark datasets and three state-of-the-art defenses validate the effectiveness and robustness of TVChain.

TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information