Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

Yijun Yang; Lichao Wang; Jianping Zhang; Chi Harold Liu; Lanqing Hong; Qiang Xu

doi:10.1609/aaai.v40i44.41144

Authors

Yijun Yang The Chinese University of Hong Kong
Lichao Wang Beijing Institute of Technology
Jianping Zhang The Chinese University of Hong Kong
Chi Harold Liu Beijing Institute of Technology
Lanqing Hong Huawei Technologies Ltd.
Qiang Xu The Chinese University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v40i44.41144

Abstract

The growing misuse of Vision-Language Models (VLMs) has led providers to deploy multiple safeguards—alignment tuning, system prompt, and content moderation. Yet the real-world robustness of these defenses against adversarial attack remains underexplored. We introduce Multi-Faceted Attack (MFA), a framework that systematically uncovers general safety vulnerabilities in leading defense-equipped VLMs, including GPT-4o, Gemini-Pro, and LLaMA 4, etc. Central to MFA is the Attention-Transfer Attack (ATA), which conceals harmful instructions inside a meta task with competing objectives. We offer a theoretical perspective grounded in reward-hacking to explain why such an attack can succeed. To maximize cross-model transfer, we introduce a lightweight transfer-enhancement algorithm combined with a simple repetition strategy that jointly evades both input- and output-level filters—without any model-specific fine-tuning. We empirically show that adversarial images optimized for one vision encoder transfer broadly to unseen VLMs, indicating that shared visual representations create a cross-model safety vulnerability. Combined, MFA reaches a 58.5% overall attack success rate, consistently outperforming existing methods. Notably, on state-of-the-art commercial models, MFA achieves a 52.8% success rate, outperforming the second-best attack by 34%. These findings challenge the perceived robustness of current defensive mechanisms, systematically expose general safety loopholes within defense-equipped VLMs, and offer a practical probe for diagnosing and evaluating the safety of VLMs.

Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information