Mitigating Pervasive Modality Absence Through Multimodal Generalization and Refinement

Authors

  • Wuliang Huang Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Key Laboratory of Mobile Computing and Pervasive Device
  • Yiqiang Chen Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Peng Cheng Laboratory Beijing Key Laboratory of Mobile Computing and Pervasive Device
  • Xinlong Jiang Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Key Laboratory of Mobile Computing and Pervasive Device
  • Chenlong Gao Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Key Laboratory of Mobile Computing and Pervasive Device
  • Teng Zhang Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Key Laboratory of Mobile Computing and Pervasive Device
  • Qian Chen Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Key Laboratory of Mobile Computing and Pervasive Device
  • Yifan Wang Tsinghua Shenzhen International Graduate School, Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v39i25.34883

Abstract

The performance of multimodal models often deteriorates when modality absence occurs. The absence disrupts the learned inter-modal correlations, resulting in biased multimodal representations. This challenge is especially pronounced when the absence is pervasive, affecting both the training and inference phases. Recent studies have attempted to reconstruct the missing information; however, most of them require complete supervision, which is seldom available in scenarios of pervasive absence. The quality of reconstruction remains a critical issue. Alternatively, others aim to learn robust representations from the available modalities but the substantial variations and biases are not fully addressed. This paper introduces the Multimodal Generalization and Refinement (MGR) framework to mitigate the issue of pervasive modality absence. MGR begins by acquiring generalized multimodal representations and iteratively refines them to recognize and calibrate the biased representations. Initially, multimodal samples with absence are embedded through foundation models, and MGR integrates independent unimodal features to further enhance generalization. Additionally, a novel mixed-context prompt is adopted to identify biases in both features and correlations. A redistribution operation can then refine these biases through graph pooling, culminating in robust and calibrated multimodal representations, which are suitable for downstream tasks. Comprehensive experiments on four benchmark datasets demonstrate that the proposed MGR framework outperforms state-of-the-art methods, effectively mitigating the impact of pervasive modality absence.

Downloads

Published

2025-04-11

How to Cite

Huang, W., Chen, Y., Jiang, X., Gao, C., Zhang, T., Chen, Q., & Wang, Y. (2025). Mitigating Pervasive Modality Absence Through Multimodal Generalization and Refinement. Proceedings of the AAAI Conference on Artificial Intelligence, 39(25), 26796–26804. https://doi.org/10.1609/aaai.v39i25.34883

Issue

Section

AAAI Technical Track on Reasoning under Uncertainty