LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models

Authors

  • Alvi Md Ishmam Virginia Polytechnic Institute and State University
  • Najibul Haque Sarker Virginia Polytechnic Institute and State University
  • Zaber Ibn Abdul Hakim Virginia Polytechnic Institute and State University
  • Chris Thomas Virginia Polytechnic Institute and State University

DOI:

https://doi.org/10.1609/aaai.v40i7.37442

Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable performance across vision-language tasks. Recent advancements allow these models to process multiple images as inputs. However, the vulnerabilities of multi-image MLLMs remain unexplored. Existing adversarial attacks focus on single-image settings and often assume a white-box threat model which is impractical in many real-world scenarios. This paper introduces LAMP, a black-box method for learning UAPs targeting multi-image MLLMs. LAMP applies an attention-based constraint that which prevents the model from effectively aggregating information across images. LAMP also introduces a novel cross-image contagious constraint that forces perturbed tokens to influence clean tokens to spread adversarial effects without requiring all inputs to be modified. Additionally, an index-attention suppression loss creates a robust position invariant attack. Experimental results show that LAMP outperforms SOTA baselines and achieves the highest attack success rates across multiple vision-language tasks.

Published

2026-03-14

How to Cite

Ishmam, A. M., Sarker, N. H., Abdul Hakim, Z. I., & Thomas, C. (2026). LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5267–5275. https://doi.org/10.1609/aaai.v40i7.37442

Issue

Section

AAAI Technical Track on Computer Vision IV