Mitigating Pervasive Modality Absence Through Multimodal Generalization and Refinement

Wuliang Huang; Yiqiang Chen; Xinlong Jiang; Chenlong Gao; Teng Zhang; Qian Chen; Yifan Wang

doi:10.1609/aaai.v39i25.34883

Authors

Wuliang Huang Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Key Laboratory of Mobile Computing and Pervasive Device
Yiqiang Chen Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Peng Cheng Laboratory Beijing Key Laboratory of Mobile Computing and Pervasive Device
Xinlong Jiang Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Key Laboratory of Mobile Computing and Pervasive Device
Chenlong Gao Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Key Laboratory of Mobile Computing and Pervasive Device
Teng Zhang Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Key Laboratory of Mobile Computing and Pervasive Device
Qian Chen Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Beijing Key Laboratory of Mobile Computing and Pervasive Device
Yifan Wang Tsinghua Shenzhen International Graduate School, Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v39i25.34883

Abstract

The performance of multimodal models often deteriorates when modality absence occurs. The absence disrupts the learned inter-modal correlations, resulting in biased multimodal representations. This challenge is especially pronounced when the absence is pervasive, affecting both the training and inference phases. Recent studies have attempted to reconstruct the missing information; however, most of them require complete supervision, which is seldom available in scenarios of pervasive absence. The quality of reconstruction remains a critical issue. Alternatively, others aim to learn robust representations from the available modalities but the substantial variations and biases are not fully addressed. This paper introduces the Multimodal Generalization and Refinement (MGR) framework to mitigate the issue of pervasive modality absence. MGR begins by acquiring generalized multimodal representations and iteratively refines them to recognize and calibrate the biased representations. Initially, multimodal samples with absence are embedded through foundation models, and MGR integrates independent unimodal features to further enhance generalization. Additionally, a novel mixed-context prompt is adopted to identify biases in both features and correlations. A redistribution operation can then refine these biases through graph pooling, culminating in robust and calibrated multimodal representations, which are suitable for downstream tasks. Comprehensive experiments on four benchmark datasets demonstrate that the proposed MGR framework outperforms state-of-the-art methods, effectively mitigating the impact of pervasive modality absence.

Mitigating Pervasive Modality Absence Through Multimodal Generalization and Refinement

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information