Hyper-Opinion Vagueness Quantification for Robust Multimodal Learning

Disen Hu; Xun Jiang; Xiaofeng Cao; Zheng Wang; Jingkuan Song; Heng Tao Shen; Xing Xu

doi:10.1609/aaai.v40i26.39335

Authors

Disen Hu School of Computer Science and Technology, Tongji University, Shanghai, China School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
Xun Jiang School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
Xiaofeng Cao School of Computer Science and Technology, Tongji University, Shanghai, China
Zheng Wang School of Computer Science and Technology, Tongji University, Shanghai, China
Jingkuan Song School of Computer Science and Technology, Tongji University, Shanghai, China
Heng Tao Shen School of Computer Science and Technology, Tongji University, Shanghai, China
Xing Xu School of Computer Science and Technology, Tongji University, Shanghai, China

DOI:

https://doi.org/10.1609/aaai.v40i26.39335

Abstract

Robust Multimodal Learning (RML) aims to address the issues of unreliable predictions of multimodal models. Nevertheless, previous RML works often struggle to distinguish between different categories that rely on identical intra-modal cues, making ambiguous predictions. We defined this degree of ``uncertain'' in extracting discriminative features of a multimodal model as vagueness. Neglecting such vagueness, as previous RML works commonly do, will undermine the ability to extract unique semantics of each category in multimodal models, further resulting in worse robustness under disturbances that affect semantic representations. Additionally, this vagueness will lead the parameter updating processes towards unreliable fusion, thus diverting the learning processes of the multimodal model from learning unique features of each category. Based on the above insight, we propose a novel robust multimodal learning approach, termed Hyper-Opinion Quantifying Vagueness (HOQV). Specifically, we first introduce hyper-opinion to capture and quantify the vagueness of multimodal learning in discriminating representations of different categories. Moreover, to mitigate the interference in parameter updating of unreliable representations with high vagueness, we also design the Hyper-Opinion Gradient Modulation to guide the optimization processes. We evaluate our HOQV on six datasets with different disturbances, including noise and adversarial attack, and demonstrate that our proposed method achieves state-of-the-art performance consistently.

Hyper-Opinion Vagueness Quantification for Robust Multimodal Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information