An Adaptive Sampling Framework for Diffusion-based Dataset Distillation with High Fidelity and Diversity

Sunbeom Jeong; Sehwan Kim; Hyeonggeun Han; Hyungjun Joo; Sangwoo Hong; Jungwoo Lee

doi:10.1609/aaai.v40i7.37447

Authors

Sunbeom Jeong Department of Electrical and Computer Engineering, Seoul National University
Sehwan Kim Department of Electrical and Computer Engineering, Seoul National University
Hyeonggeun Han Department of Electrical and Computer Engineering, Seoul National University
Hyungjun Joo Samsung Electronics
Sangwoo Hong Department of Computer Science and Engineering, Konkuk University
Jungwoo Lee Department of Electrical and Computer Engineering, Seoul National University NextQuantum, Seoul National University Hodoo AI Labs

DOI:

https://doi.org/10.1609/aaai.v40i7.37447

Abstract

Dataset distillation (DD) aims to generate a compact synthetic dataset that enables efficient training of neural networks while maintaining performance comparable to that achieved with the original dataset. However, existing methods often suffer from two main limitations. They either rely on computationally intensive iterative optimization procedures or depend heavily on architecture-specific designs. These issues limit their practicality for large-scale datasets and hinder generalization across different model architectures. To overcome these challenges, recent research has explored the use of diffusion models as an architecture-agnostic approach to dataset distillation, offering improved scalability and generalization for large-scale datasets across diverse model architectures. While diffusion-based dataset distillation methods have shown considerable potential, several challenges remain. Notably, certain approaches exhibit a distributional mismatch between the pre-trained diffusion model and the target dataset, which can adversely affect the fidelity and representativeness of the generated samples. Others require substantial fine-tuning to achieve high fidelity, which negates the benefits of architectural flexibility. In this work, we propose a new diffusion-based dataset distillation framework that effectively preserves the characteristics of the original dataset without requiring any fine-tuning. Our method employs adaptive sampling and repulsion regularization to enhance both the fidelity and diversity of generated samples. As a result, the proposed approach outperforms state-of-the-art distillation methods across a wide range of datasets and model architectures.

An Adaptive Sampling Framework for Diffusion-based Dataset Distillation with High Fidelity and Diversity

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information