SEER: Backdoor Detection for Vision-Language Models through Searching Target Text and Image Trigger Jointly

Authors

  • Liuwan Zhu University of Hawaii at Manoa
  • Rui Ning Old Dominion University
  • Jiang Li Old Dominion University
  • Chunsheng Xin Old Dominion University
  • Hongyi Wu Univesity of Arizona

DOI:

https://doi.org/10.1609/aaai.v38i7.28611

Keywords:

CV: Language and Vision, CV: Adversarial Attacks & Robustness

Abstract

This paper proposes SEER, a novel backdoor detection algorithm for vision-language models, addressing the gap in the literature on multi-modal backdoor detection. While backdoor detection in single-modal models has been well studied, the investigation of such defenses in multi-modal models remains limited. Existing backdoor defense mechanisms cannot be directly applied to multi-modal settings due to their increased complexity and search space explosion. In this paper, we propose to detect backdoors in vision-language models by jointly searching image triggers and malicious target texts in feature space shared by vision and language modalities. Our extensive experiments demonstrate that SEER can achieve over 92% detection rate on backdoor detection in vision-language models in various settings without accessing training data or knowledge of downstream tasks.

Published

2024-03-24

How to Cite

Zhu, L., Ning, R., Li, J., Xin, C., & Wu, H. (2024). SEER: Backdoor Detection for Vision-Language Models through Searching Target Text and Image Trigger Jointly. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7766-7774. https://doi.org/10.1609/aaai.v38i7.28611

Issue

Section

AAAI Technical Track on Computer Vision VI