TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models

Authors

  • Hao Yu National University of Defense Technology
  • Ke Liang National University of Defense Technology
  • Junxian Duan Institute of automation, Chinese academy of science, Chinese Academy of Sciences
  • Jun Wang National University of Defense Technology
  • Siwei Wang Intelligent Game and Decision Lab
  • Chuan Ma Chongqing University
  • Xinwang Liu National University of Defense Technology

DOI:

https://doi.org/10.1609/aaai.v40i33.40018

Abstract

Large Vision-Language Models (LVLMs) enhance the capabilities of Large Language Models by integrating visual inputs, thereby enabling advanced multimodal reasoning across diverse applications. However, these enhanced reasoning capabilities introduce new security risks, particularly to jailbreaking attacks that bypass built-in safety mechanisms to elicit harmful or unauthorized outputs. While recent efforts have explored adversarial and typographic prompts, most existing attacks suffer from three key limitations: reliance on auxiliary models, limited effectiveness in black-box scenarios, and inadequate exploitation of the LVLMs' intrinsic reasoning abilities. In this work, we propose TVChain, a novel black-box jailbreaking framework that explicitly intervenes in both the visual and textual reasoning processes of LVLMs. TVChain decomposes malicious prompts into a sequence of semantically meaningful sub-images that represent relevant objects and behaviors, thereby circumventing direct exposure of illicit content. In parallel, a carefully designed chain-of-thought (CoT) textual prompt is employed to steer the model's reasoning toward reconstructing the intended activity in a covert yet effective manner. We demonstrate that this compositional prompting strategy reduces the likelihood of triggering safety mechanisms while preserving attack efficacy. Extensive evaluations on eleven LVLMs (seven open-source and four commercial) across two benchmark datasets and three state-of-the-art defenses validate the effectiveness and robustness of TVChain.

Published

2026-03-14

How to Cite

Yu, H., Liang, K., Duan, J., Wang, J., Wang, S., Ma, C., & Liu, X. (2026). TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(33), 27943–27951. https://doi.org/10.1609/aaai.v40i33.40018

Issue

Section

AAAI Technical Track on Machine Learning X