SIDE: Surrogate Conditional Data Extraction from Diffusion Models

Yunhao Chen; Shujie Wang; Difan Zou; Xingjun Ma

doi:10.1609/aaai.v40i1.36972

Authors

Yunhao Chen Fudan University
Shujie Wang Fudan University
Difan Zou University of Hong Kong
Xingjun Ma Fudan University

DOI:

https://doi.org/10.1609/aaai.v40i1.36972

Abstract

As diffusion probabilistic models (DPMs) become central to Generative AI (GenAI), understanding their memorization behavior is essential for evaluating risks such as data leakage, copyright infringement, and trustworthiness. While prior research finds conditional DPMs highly susceptible to data extraction attacks using explicit prompts, unconditional models are often assumed to be safe. We challenge this view by introducing Surrogate condItional Data Extraction (SIDE), a general framework that constructs data-driven surrogate conditions to enable targeted extraction from any DPM. Through extensive experiments on CIFAR-10, CelebA, ImageNet, and LAION-5B, we show that SIDE can successfully extract training data from so-called safe unconditional models, outperforming baseline attacks even on conditional models. Complementing these findings, we present a unified theoretical framework based on informative labels, demonstrating that all forms of conditioning, explicit or surrogate, amplify memorization. Our work redefines the threat landscape for DPMs, establishing precise conditioning as a fundamental vulnerability and setting a new, stronger benchmark for model privacy evaluation.

SIDE: Surrogate Conditional Data Extraction from Diffusion Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information