Semi-Supervised Multimodal Classification Through Learning from Modal and Strategic Complementarities

Authors

  • Junchi Chen CCSE, School of Computer Science and Engineering, Beihang University, Beijing, China
  • Richong Zhang CCSE, School of Computer Science and Engineering, Beihang University, Beijing, China Zhongguancun Laboratory, Beijing, China
  • Junfan Chen CCSE, School of Computer Science and Engineering, Beihang University, Beijing, China School of Software, Beihang University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v39i15.33736

Abstract

Supervised multimodal classification has been proven to outperform unimodal classification in the image-text domain. However, this task is highly dependent on abundant labeled data. To perform multimodal classification in data-insufficient scenarios, in this study, we explore semi-supervised multimodal classification (SSMC) that only requires a small amount of labeled data and plenty of unlabeled data. Specifically, we first design baseline SSMC models by combining known semi supervised pseudo-labeling methods with the two most commonly used modal fusion strategies, i.e. feature-level fusion and label-level aggregation. Based on our investigation and empirical study of the baselines, we discover two complementarities that may benefit SSMC if properly exploited: the predictions from different modalities (modal complementarity) and modal fusion strategies for pseudo-labeling (strategic complementarity). Therefore, we propose a Modal and Strategic Complementarity (MSC) framework for SSMC. Concretely, to exploit modal complementarity, we propose to learn reliability weights for the predictions from different modalities and refine the fusion scores. To learn from strategic complementarity, we introduce a dual KL divergence loss to guide the balance of quantity and quality of pseudo-labeled data selection. Extensive empirical studies demonstrate the effectiveness of the proposed framework.

Downloads

Published

2025-04-11

How to Cite

Chen, J., Zhang, R., & Chen, J. (2025). Semi-Supervised Multimodal Classification Through Learning from Modal and Strategic Complementarities. Proceedings of the AAAI Conference on Artificial Intelligence, 39(15), 15812–15820. https://doi.org/10.1609/aaai.v39i15.33736

Issue

Section

AAAI Technical Track on Machine Learning I