LDS2AE: Local Diffusion Shared-Specific Autoencoder for Multimodal Remote Sensing Image Classification with Arbitrary Missing Modalities

Authors

  • Jiahui Qu State Key Laboratory of Integrated Service Network, Xidian University, Xi’an 710071, China
  • Yuanbo Yang State Key Laboratory of Integrated Service Network, Xidian University, Xi’an 710071, China
  • Wenqian Dong State Key Laboratory of Integrated Service Network, Xidian University, Xi’an 710071, China
  • Yufei Yang State Key Laboratory of Integrated Service Network, Xidian University, Xi’an 710071, China

DOI:

https://doi.org/10.1609/aaai.v38i13.29391

Keywords:

ML: Classification and Regression, ML: Deep Generative Models & Autoencoders

Abstract

Recent research on the joint classification of multimodal remote sensing data has achieved great success. However, due to the limitations imposed by imaging conditions, the case of missing modalities often occurs in practice. Most previous researchers regard the classification in case of different missing modalities as independent tasks. They train a specific classification model for each fixed missing modality by extracting multimodal joint representation, which cannot handle the classification of arbitrary (including multiple and random) missing modalities. In this work, we propose a local diffusion shared-specific autoencoder (LDS2AE), which solves the classification of arbitrary missing modalities with a single model. The LDS2AE captures the data distribution of different modalities to learn multimodal shared feature for classification by designing a novel local diffusion autoencoder which consists of a modality-shared encoder and several modality-specific decoders. The modality-shared encoder is designed to extract multimodal shared feature by employing the same parameters to map multimodal data into a shared subspace. The modality-specific decoders put the multimodal shared feature to reconstruct the image of each modality, which facilitates the shared feature to learn unique information of different modalities. In addition, we incorporate masked training to the diffusion autoencoder to achieve local diffusion, which significantly reduces the training cost of model. The approach is tested on widely-used multimodal remote sensing datasets, demonstrating the effectiveness of the proposed LDS2AE in addressing the classification of arbitrary missing modalities. The code is available at https://github.com/Jiahuiqu/LDS2AE.

Published

2024-03-24

How to Cite

Qu, J., Yang, Y., Dong, W., & Yang, Y. (2024). LDS2AE: Local Diffusion Shared-Specific Autoencoder for Multimodal Remote Sensing Image Classification with Arbitrary Missing Modalities. Proceedings of the AAAI Conference on Artificial Intelligence, 38(13), 14731-14739. https://doi.org/10.1609/aaai.v38i13.29391

Issue

Section

AAAI Technical Track on Machine Learning IV