M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy

Authors

  • Hansong Zhang Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100092, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
  • Shikun Li Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100092, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
  • Pengju Wang Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100092, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
  • Dan Zeng Department of Communication Engineering, Shanghai University, Shanghai 200040, China
  • Shiming Ge Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100092, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China

DOI:

https://doi.org/10.1609/aaai.v38i8.28784

Keywords:

DMKM: Data Compression, CV: Applications, CV: Learning & Optimization for CV, ML: Applications

Abstract

Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs. To address these challenges, dataset condensation has been developed to learn a small synthetic set that preserves essential information from the original large-scale dataset. Nowadays, optimization-oriented methods have been the primary method in the field of dataset condensation for achieving SOTA results. However, the bi-level optimization process hinders the practical application of such methods to realistic and larger datasets. To enhance condensation efficiency, previous works proposed Distribution-Matching (DM) as an alternative, which significantly reduces the condensation cost. Nonetheless, current DM-based methods still yield less comparable results to SOTA optimization-oriented methods. In this paper, we argue that existing DM-based methods overlook the higher-order alignment of the distributions, which may lead to sub-optimal matching results. Inspired by this, we present a novel DM-based method named M3D for dataset condensation by Minimizing the Maximum Mean Discrepancy between feature representations of the synthetic and real images. By embedding their distributions in a reproducing kernel Hilbert space, we align all orders of moments of the distributions of real and synthetic images, resulting in a more generalized condensed set. Notably, our method even surpasses the SOTA optimization-oriented method IDC on the high-resolution ImageNet dataset. Extensive analysis is conducted to verify the effectiveness of the proposed method. Source codes are available at https://github.com/Hansong-Zhang/M3D.

Published

2024-03-24

How to Cite

Zhang, H., Li, S., Wang, P., Zeng, D., & Ge, S. (2024). M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy. Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 9314-9322. https://doi.org/10.1609/aaai.v38i8.28784

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management