Multimodal Gaussian Mixture Variational Autoencoder with Consistency Regularizations

Yarui Chen; Lehan Hong; Jianlin Shao; Jianning Yang; Tingting Zhao; Yun Liao; Yancui Shi

doi:10.1609/aaai.v40i4.37302

Authors

Yarui Chen Tianjin University of Science and Technology
Lehan Hong Tianjin University of Science and Technology
Jianlin Shao Tianjin University of Science and Technology
Jianning Yang XI'AN University of Posts&Telecommunications
Tingting Zhao Tianjin University of Science and Technology
Yun Liao Tianjin University of Science and Technology
Yancui Shi Tianjin University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i4.37302

Abstract

Variational autoencoder (VAE)-based frameworks possess a natural advantage in modeling the shared and private information inherent in multimodal data. However, current models focus on improving the quality of shared representations from the reconstruction perspective, lacking explicit mechanisms to model their underlying semantic structure. In this paper, we propose the multimodal Gaussian mixture variational autoencoder with consistency regularizations, which introduces a Gaussian mixture prior over the shared latent space to enhance its semantic structure and encourage the formation of cluster-aware latent representations. To address the cross-modal inconsistency problem under missing modality conditions, we propose a cluster-guided regularization strategy that enforces the cross-modal consistency using the pseudo-category labels from unsupervised clustering. Additionally, we design a self-supervised contrastive regularization strategy to align semantically similar representations across modalities. Extensive experiments on MNIST-SVHN and MNIST-CDCB datasets demonstrate that our method significantly outperforms prior state-of-the-art models in generation, classification, and retrieval tasks.

Multimodal Gaussian Mixture Variational Autoencoder with Consistency Regularizations

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information