ImageBindDC: Compressing Multi-modal Data with ImageBind-based Condensation

Authors

  • Yue Min EPIC Lab, Shanghai Jiao Tong University, Bosch Corporate Research Asia Pacific, Hong Kong University of Science and Technology
  • Shaobo Wang EPIC Lab, Shanghai Jiao Tong University, Alibaba Group
  • Jiaze Li EPIC Lab, Shanghai Jiao Tong University
  • Tianle Niu EPIC Lab, Shanghai Jiao Tong University
  • Junxin Fan EPIC Lab, Shanghai Jiao Tong University
  • Yongliang Miao EPIC Lab, Shanghai Jiao Tong University
  • Lijin Yang Bosch Corporate Research Asia Pacific
  • Linfeng Zhang EPIC Lab, Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v40i18.38582

Abstract

Data condensation techniques aim to synthesize a compact dataset from a larger one to enable efficient model training, yet while successful in unimodal settings, they often fail in multimodal scenarios where preserving intricate inter-modal dependencies is crucial. To address this, we introduce ImageBindDC, a novel data condensation framework operating within the unified feature space of ImageBind. Our approach moves beyond conventional distribution-matching by employing a powerful Characteristic Function (CF) loss, which operates in the Fourier domain to facilitate a more precise statistical alignment via exact infinite moment matching. We design our objective to enforce three critical levels of distributional consistency: (i) uni-modal alignment, which matches the statistical properties of synthetic and real data within each modality; (ii) cross-modal alignment, which preserves pairwise semantics by matching the distributions of hybrid real-synthetic data pairs; and (iii) joint-modal alignment, which captures the complete multivariate data structure by aligning the joint distribution of real data pairs with their synthetic counterparts. Extensive experiments highlight the effectiveness of ImageBindDC: on the NYU-v2 dataset, a model trained on just 5 condensed datapoints per class achieves lossless performance comparable to one trained on the full dataset, achieving a new state-of-the-art with an 8.2% absolute improvement over the previous best method and more than 4× less condensation time.

Downloads

Published

2026-03-14

How to Cite

Min, Y., Wang, S., Li, J., Niu, T., Fan, J., Miao, Y., … Zhang, L. (2026). ImageBindDC: Compressing Multi-modal Data with ImageBind-based Condensation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(18), 15537–15545. https://doi.org/10.1609/aaai.v40i18.38582

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management II