Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning

Yan-Kai Liu; Jinyu Cai; Bao-Liang Lu; Wei-Long Zheng

doi:10.1609/aaai.v39i2.32134

Authors

Yan-Kai Liu Shanghai Jiao Tong University
Jinyu Cai Shanghai Jiao Tong University
Bao-Liang Lu Shanghai Jiao Tong University
Wei-Long Zheng Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v39i2.32134

Abstract

Multimodal emotion recognition is a crucial research area in the field of affective brain-computer interfaces. However, in practical applications, it is often challenging to obtain all modalities simultaneously. To deal with this problem, researchers focus on using cross-modal methods to learn multimodal representations with fewer modalities. However, due to the significant differences in the distribution of different modalities, it is challenging to enable any modality to fully learn multimodal features. To address this limitation, we propose a Multi-to-Single (M2S) emotion recognition model, leveraging contrastive learning and incorporating two innovative modules: 1) a spatial and temporal-sparse (STS) attention mechanism that enhances the encoders' ability to extract features from data; 2) a novel Multi-to-Multi Contrastive Predictive Coding (M2M CPC) that learns and fuses features across different modalities. In the final testing, we only use a single modality for emotion recognition, reducing the dependence on multimodal data. Extensive experiments on five public multimodal emotion datasets demonstrate that our model achieves the state-of-the-art performance in the cross-modal tasks and maintains multimodal performance using only a single modality.

Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information