An Adaptive Sampling Framework for Diffusion-based Dataset Distillation with High Fidelity and Diversity

Authors

  • Sunbeom Jeong Department of Electrical and Computer Engineering, Seoul National University
  • Sehwan Kim Department of Electrical and Computer Engineering, Seoul National University
  • Hyeonggeun Han Department of Electrical and Computer Engineering, Seoul National University
  • Hyungjun Joo Samsung Electronics
  • Sangwoo Hong Department of Computer Science and Engineering, Konkuk University
  • Jungwoo Lee Department of Electrical and Computer Engineering, Seoul National University NextQuantum, Seoul National University Hodoo AI Labs

DOI:

https://doi.org/10.1609/aaai.v40i7.37447

Abstract

Dataset distillation (DD) aims to generate a compact synthetic dataset that enables efficient training of neural networks while maintaining performance comparable to that achieved with the original dataset. However, existing methods often suffer from two main limitations. They either rely on computationally intensive iterative optimization procedures or depend heavily on architecture-specific designs. These issues limit their practicality for large-scale datasets and hinder generalization across different model architectures. To overcome these challenges, recent research has explored the use of diffusion models as an architecture-agnostic approach to dataset distillation, offering improved scalability and generalization for large-scale datasets across diverse model architectures. While diffusion-based dataset distillation methods have shown considerable potential, several challenges remain. Notably, certain approaches exhibit a distributional mismatch between the pre-trained diffusion model and the target dataset, which can adversely affect the fidelity and representativeness of the generated samples. Others require substantial fine-tuning to achieve high fidelity, which negates the benefits of architectural flexibility. In this work, we propose a new diffusion-based dataset distillation framework that effectively preserves the characteristics of the original dataset without requiring any fine-tuning. Our method employs adaptive sampling and repulsion regularization to enhance both the fidelity and diversity of generated samples. As a result, the proposed approach outperforms state-of-the-art distillation methods across a wide range of datasets and model architectures.

Published

2026-03-14

How to Cite

Jeong, S., Kim, S., Han, H., Joo, H., Hong, S., & Lee, J. (2026). An Adaptive Sampling Framework for Diffusion-based Dataset Distillation with High Fidelity and Diversity. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5314–5322. https://doi.org/10.1609/aaai.v40i7.37447

Issue

Section

AAAI Technical Track on Computer Vision IV