Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models
DOI:
https://doi.org/10.1609/aaai.v40i44.41144Abstract
The growing misuse of Vision-Language Models (VLMs) has led providers to deploy multiple safeguards—alignment tuning, system prompt, and content moderation. Yet the real-world robustness of these defenses against adversarial attack remains underexplored. We introduce Multi-Faceted Attack (MFA), a framework that systematically uncovers general safety vulnerabilities in leading defense-equipped VLMs, including GPT-4o, Gemini-Pro, and LLaMA 4, etc. Central to MFA is the Attention-Transfer Attack (ATA), which conceals harmful instructions inside a meta task with competing objectives. We offer a theoretical perspective grounded in reward-hacking to explain why such an attack can succeed. To maximize cross-model transfer, we introduce a lightweight transfer-enhancement algorithm combined with a simple repetition strategy that jointly evades both input- and output-level filters—without any model-specific fine-tuning. We empirically show that adversarial images optimized for one vision encoder transfer broadly to unseen VLMs, indicating that shared visual representations create a cross-model safety vulnerability. Combined, MFA reaches a 58.5% overall attack success rate, consistently outperforming existing methods. Notably, on state-of-the-art commercial models, MFA achieves a 52.8% success rate, outperforming the second-best attack by 34%. These findings challenge the perceived robustness of current defensive mechanisms, systematically expose general safety loopholes within defense-equipped VLMs, and offer a practical probe for diagnosing and evaluating the safety of VLMs.Downloads
Published
2026-03-14
How to Cite
Yang, Y., Wang, L., Zhang, J., Liu, C. H., Hong, L., & Xu, Q. (2026). Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 38066–38074. https://doi.org/10.1609/aaai.v40i44.41144
Issue
Section
AAAI Special Track on AI Alignment