SIDE: Surrogate Conditional Data Extraction from Diffusion Models

Authors

  • Yunhao Chen Fudan University
  • Shujie Wang Fudan University
  • Difan Zou University of Hong Kong
  • Xingjun Ma Fudan University

DOI:

https://doi.org/10.1609/aaai.v40i1.36972

Abstract

As diffusion probabilistic models (DPMs) become central to Generative AI (GenAI), understanding their memorization behavior is essential for evaluating risks such as data leakage, copyright infringement, and trustworthiness. While prior research finds conditional DPMs highly susceptible to data extraction attacks using explicit prompts, unconditional models are often assumed to be safe. We challenge this view by introducing Surrogate condItional Data Extraction (SIDE), a general framework that constructs data-driven surrogate conditions to enable targeted extraction from any DPM. Through extensive experiments on CIFAR-10, CelebA, ImageNet, and LAION-5B, we show that SIDE can successfully extract training data from so-called safe unconditional models, outperforming baseline attacks even on conditional models. Complementing these findings, we present a unified theoretical framework based on informative labels, demonstrating that all forms of conditioning, explicit or surrogate, amplify memorization. Our work redefines the threat landscape for DPMs, establishing precise conditioning as a fundamental vulnerability and setting a new, stronger benchmark for model privacy evaluation.

Downloads

Published

2026-03-14

How to Cite

Chen, Y., Wang, S., Zou, D., & Ma, X. (2026). SIDE: Surrogate Conditional Data Extraction from Diffusion Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(1), 128–136. https://doi.org/10.1609/aaai.v40i1.36972

Issue

Section

AAAI Technical Track on Application Domains I