Semi-Supervised Multi-Modal Learning with Balanced Spectral Decomposition

Peng Hu; Hongyuan Zhu; Xi Peng; Jie Lin

doi:10.1609/aaai.v34i01.5339

Authors

Peng Hu Institute for Infocomm Research, Agency for Science, Technology and Research
Hongyuan Zhu Institute for Infocomm Research, Agency for Science, Technology and Research
Xi Peng Sichuan University
Jie Lin Institute for Infocomm Research, Agency for Science, Technology and Research

DOI:

https://doi.org/10.1609/aaai.v34i01.5339

Abstract

Cross-modal retrieval aims to retrieve the relevant samples across different modalities, of which the key problem is how to model the correlations among different modalities while narrowing the large heterogeneous gap. In this paper, we propose a Semi-supervised Multimodal Learning Network method (SMLN) which correlates different modalities by capturing the intrinsic structure and discriminative correlation of the multimedia data. To be specific, the labeled and unlabeled data are used to construct a similarity matrix which integrates the cross-modal correlation, discrimination, and intra-modal graph information existing in the multimedia data. What is more important is that we propose a novel optimization approach to optimize our loss within a neural network which involves a spectral decomposition problem derived from a ratio trace criterion. Our optimization enjoys two advantages given below. On the one hand, the proposed approach is not limited to our loss, which could be applied to any case that is a neural network with the ratio trace criterion. On the other hand, the proposed optimization is different from existing ones which alternatively maximize the minor eigenvalues, thus overemphasizing the minor eigenvalues and ignore the dominant ones. In contrast, our method will exactly balance all eigenvalues, thus being more competitive to existing methods. Thanks to our loss and optimization strategy, our method could well preserve the discriminative and instinct information into the common space and embrace the scalability in handling large-scale multimedia data. To verify the effectiveness of the proposed method, extensive experiments are carried out on three widely-used multimodal datasets comparing with 13 state-of-the-art approaches.

Semi-Supervised Multi-Modal Learning with Balanced Spectral Decomposition

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information