Hyper-Opinion Vagueness Quantification for Robust Multimodal Learning

Authors

  • Disen Hu School of Computer Science and Technology, Tongji University, Shanghai, China School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
  • Xun Jiang School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
  • Xiaofeng Cao School of Computer Science and Technology, Tongji University, Shanghai, China
  • Zheng Wang School of Computer Science and Technology, Tongji University, Shanghai, China
  • Jingkuan Song School of Computer Science and Technology, Tongji University, Shanghai, China
  • Heng Tao Shen School of Computer Science and Technology, Tongji University, Shanghai, China
  • Xing Xu School of Computer Science and Technology, Tongji University, Shanghai, China

DOI:

https://doi.org/10.1609/aaai.v40i26.39335

Abstract

Robust Multimodal Learning (RML) aims to address the issues of unreliable predictions of multimodal models. Nevertheless, previous RML works often struggle to distinguish between different categories that rely on identical intra-modal cues, making ambiguous predictions. We defined this degree of ``uncertain'' in extracting discriminative features of a multimodal model as vagueness. Neglecting such vagueness, as previous RML works commonly do, will undermine the ability to extract unique semantics of each category in multimodal models, further resulting in worse robustness under disturbances that affect semantic representations. Additionally, this vagueness will lead the parameter updating processes towards unreliable fusion, thus diverting the learning processes of the multimodal model from learning unique features of each category. Based on the above insight, we propose a novel robust multimodal learning approach, termed Hyper-Opinion Quantifying Vagueness (HOQV). Specifically, we first introduce hyper-opinion to capture and quantify the vagueness of multimodal learning in discriminating representations of different categories. Moreover, to mitigate the interference in parameter updating of unreliable representations with high vagueness, we also design the Hyper-Opinion Gradient Modulation to guide the optimization processes. We evaluate our HOQV on six datasets with different disturbances, including noise and adversarial attack, and demonstrate that our proposed method achieves state-of-the-art performance consistently.

Downloads

Published

2026-03-14

How to Cite

Hu, D., Jiang, X., Cao, X., Wang, Z., Song, J., Shen, H. T., & Xu, X. (2026). Hyper-Opinion Vagueness Quantification for Robust Multimodal Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(26), 21831–21839. https://doi.org/10.1609/aaai.v40i26.39335

Issue

Section

AAAI Technical Track on Machine Learning III