Generating-Filtering-Ranking: A Three-Stage MultiModal Data Augmentation Framework Under Partial Modality Missing

Authors

  • Zhirui Kuai School of Computer Science and Engineering, Central South University, Changsha, China
  • Huan Zhang School of Computer Science and Engineering, Central South University, Changsha, China
  • Yang Yang School of Computer Science and Engineering, Central South University, Changsha, China
  • Yiping Ma School of Computer Science and Engineering, Central South University, Changsha, China
  • Mingjing Huang School of Computer Science and Engineering, Central South University, Changsha, China
  • Ning Gui School of Computer Science and Engineering, Central South University, Changsha, China
  • Li Kuang School of Computer Science and Engineering, Central South University, Changsha, China

DOI:

https://doi.org/10.1609/aaai.v40i7.37496

Abstract

Multimodal data significantly improves the performance of pretrained models, but its practical application is often limited by missing or incomplete data across modalities. There are two key challenges that existing methods of synthesizing missing data face: (1) semantic inaccuracies due to model hallucinations and (2) discrepancies in distribution preferences between generated and original data. To address these challenges, we propose a novel three-stage multimodal data augmentation framework (GFR), which Generate, Filter, and Rank missing modality data. Our framework leverages multimodal large models for diverse data generation, designs a scene graph matching-based filtering algorithm to ensure semantic consistency, and constructs a preference-aware ranking model to align the generated data with both the original distribution and task relevance. Our framework not only enhances semantic diversity and consistency in data generation but also effectively captures the implicit characteristics of the original dataset and the target model. We demonstrate the effectiveness of GFR across multiple datasets by testing different missing types and missing ratios.

Downloads

Published

2026-03-14

How to Cite

Kuai, Z., Zhang, H., Yang, Y., Ma, Y., Huang, M., Gui, N., & Kuang, L. (2026). Generating-Filtering-Ranking: A Three-Stage MultiModal Data Augmentation Framework Under Partial Modality Missing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5755–5763. https://doi.org/10.1609/aaai.v40i7.37496

Issue

Section

AAAI Technical Track on Computer Vision IV