LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models

Alvi Md Ishmam; Najibul Haque Sarker; Zaber Ibn Abdul Hakim; Chris Thomas

doi:10.1609/aaai.v40i7.37442

Authors

Alvi Md Ishmam Virginia Polytechnic Institute and State University
Najibul Haque Sarker Virginia Polytechnic Institute and State University
Zaber Ibn Abdul Hakim Virginia Polytechnic Institute and State University
Chris Thomas Virginia Polytechnic Institute and State University

DOI:

https://doi.org/10.1609/aaai.v40i7.37442

Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable performance across vision-language tasks. Recent advancements allow these models to process multiple images as inputs. However, the vulnerabilities of multi-image MLLMs remain unexplored. Existing adversarial attacks focus on single-image settings and often assume a white-box threat model which is impractical in many real-world scenarios. This paper introduces LAMP, a black-box method for learning UAPs targeting multi-image MLLMs. LAMP applies an attention-based constraint that which prevents the model from effectively aggregating information across images. LAMP also introduces a novel cross-image contagious constraint that forces perturbed tokens to influence clean tokens to spread adversarial effects without requiring all inputs to be modified. Additionally, an index-attention suppression loss creates a robust position invariant attack. Experimental results show that LAMP outperforms SOTA baselines and achieves the highest attack success rates across multiple vision-language tasks.

LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information