PurMM: Attention-Guided Test-Time Backdoor Purification in Multimodal Large Language Models

Authors

  • Wenzheng Jiang National University of Defense Technology
  • Ke Liang National University of Defense Technology
  • Xuankun Rong Wuhan University
  • Jingxuan Zhou National University of Defense Technology
  • Zhengyi Zhong National University of Defense Technology
  • Guancheng Wan Wuhan University
  • Ji Wang National University of Defense Technology

DOI:

https://doi.org/10.1609/aaai.v40i42.40867

Abstract

Downstream fine-tuning of Multimodal Large Language Models (MLLMs) is advancing rapidly, allowing general models to achieve superior performance on domain-specific tasks. Yet most prior research focuses on performance gains and overlooks the vulnerability of the fine-tuning pipeline: attackers can easily poison the dataset to implant backdoors into MLLMs. We conduct an in-depth investigation of backdoor attacks on MLLMs and reveal the phenomenon of Attention Hijacking and its Hierarchical Mechanism. Guided by this insight, we propose PurMM, a test-time backdoor purification framework that removes visual tokens exhibiting anomalous attention, thereby avoiding targeted outputs while restoring correct answers. PurMM contains three stages: (1) locating tokens with abnormal attention, (2) filtering them using deep-layer cues, and (3) zeroing out their corresponding components in the visual embeddings. Unlike existing defences, PurMM dispenses with retraining and training-process modifications, operating at test-time to restore model performance while eliminating the backdoor. Extensive experiments across multiple MLLMs and datasets show that PurMM maintains normal performance, sharply reduces attack success rates, and consistently converts backdoor outputs to benign ones, offering a new perspective for safeguarding MLLMs.

Published

2026-03-14

How to Cite

Jiang, W., Liang, K., Rong, X., Zhou, J., Zhong, Z., Wan, G., & Wang, J. (2026). PurMM: Attention-Guided Test-Time Backdoor Purification in Multimodal Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 35562–35570. https://doi.org/10.1609/aaai.v40i42.40867

Issue

Section

AAAI Technical Track on Philosophy and Ethics of AI