Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment

Authors

  • Yaling Shen Bosch Center for Artificial Intelligence, Germany Technical University of Munich, Germany Munich Center for Machine Learning, Germany
  • Zhixiong Zhuang Bosch Center for Artificial Intelligence, Germany Saarland University, Germany
  • Kun Yuan Technical University of Munich, Germany Munich Center for Machine Learning, Germany University of Strasbourg, France
  • Maria-Irina Nicolae Bosch Center for Artificial Intelligence, Germany
  • Nassir Navab Technical University of Munich, Germany
  • Nicolas Padoy University of Strasbourg, France IHU Strasbourg, France
  • Mario Fritz CISPA Helmholtz Center for Information Security, Germany

DOI:

https://doi.org/10.1609/aaai.v39i7.32734

Abstract

Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems, assisting medical personnel with decision making and results analysis. Models for radiology report generation are able to interpret medical imagery, thus reducing the workload of radiologists. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. However, these assets are potentially vulnerable to model stealing, where attackers aim to replicate their functionality via black-box access. So far, model stealing for the medical domain has focused on image classification; however, existing attacks are not effective against MLLMs. In this paper, we introduce Adversarial Domain Alignment (ADA-Steal), the first stealing attack against medical MLLMs. ADA-Steal relies on natural images, which are public and widely available, as opposed to their medical counterparts. We show that data augmentation with adversarial noise is sufficient to overcome the data distribution gap between natural images and the domain-specific distribution of the victim MLLM. Experiments on the IU X-RAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data.

Published

2025-04-11

How to Cite

Shen, Y., Zhuang, Z., Yuan, K., Nicolae, M.-I., Navab, N., Padoy, N., & Fritz, M. (2025). Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 6842–6850. https://doi.org/10.1609/aaai.v39i7.32734

Issue

Section

AAAI Technical Track on Computer Vision VI