Correspondence Coverage Matters for Multi-Modal Dataset Distillation
DOI:
https://doi.org/10.1609/aaai.v40i25.39207Abstract
Multi-modal dataset distillation (DD) condenses large datasets into compact ones that retain task efficacy by capturing correspondence patterns, i.e., shared semantics between paired modalities. However, such patterns rely on cross-modal similarity and cannot be faithfully captured by intra-modal similarity of current unimodal strategies. As a result, current multi-modal DD methods tend to over-concentrate, redundantly encoding similar correspondence patterns and thus limiting generalizability. To this end, we propose a novel multi-modal DD framework to systematically Promote Correspondence coverage, i.e., ProCo. Initially, we develop a correspondence consistency metric based on cross-modal retrieval distributions to cluster correspondence patterns. These clusters capture the underlying correspondence distribution, enabling ProCo to initialize distilled data with representative patterns while regularizing optimization to promote correspondence representativeness and diversity. Moreover, we employ conditional neural fields for efficient distilled data parameterization, enhancing fine-grained pattern capture while allowing more distilled data under a fixed budget to boost correspondence coverage. Extensive experiments verify that our ProCo achieves superior and elastic budget-efficacy trade-offs, surpassing prior methods by over 15% with 10x distillation budget reduction, highlighting its real-world practicality.Published
2026-03-14
How to Cite
Dang, Z., Luo, M., Jia, C., Qian, H., Zhang, X., Chang, X., & Tsang, I. (2026). Correspondence Coverage Matters for Multi-Modal Dataset Distillation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(25), 20693–20701. https://doi.org/10.1609/aaai.v40i25.39207
Issue
Section
AAAI Technical Track on Machine Learning II