Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum

Authors

  • Wei Ai Central South University of Forestry and Technology
  • Fuchen Zhang Central South University of Forestry and Technology
  • Yuntao Shou Central South University of Forestry and Technology
  • Tao Meng Central South University of Forestry and Technology
  • Haowen Chen Hunan University
  • Keqin Li State University of New York at New Paltz

DOI:

https://doi.org/10.1609/aaai.v39i11.33242

Abstract

Efficiently capturing consistent and complementary semantic features in context is crucial for Multimodal Emotion Recognition in Conversations (MERC). However, limited by the over-smoothing or low-pass filtering characteristics of spatial graph neural networks, are insufficient to accurately capture the long-distance consistency low-frequency information and complementarity high-frequency information of the utterances. To this end, this paper revisits the task of MERC from the perspective of the graph spectrum and proposes a Graph-Spectrum-based Multimodal Consistency and Complementary collaborative learning framework GS-MCC. First, GS-MCC uses a sliding window to construct a multimodal interaction graph to model conversational relationships and designs efficient Fourier graph operators (FGO) to extract long-distance high-frequency and low-frequency information, respectively. FGO can be stacked in multiple layers, which can effectively alleviate the over-smoothing problem. Then, GS-MCC uses contrastive learning to construct self-supervised signals that reflect complementarity and consistent semantic collaboration with high and low-frequency signals, thereby improving the ability of high and low-frequency information to reflect genuine emotions. Finally, GS-MCC inputs the coordinated high and low-frequency information into the MLP network and softmax function for emotion prediction. Extensive experiments have proven the superiority of the GS-MCC architecture proposed in this paper on two benchmark data sets.

Downloads

Published

2025-04-11

How to Cite

Ai, W., Zhang, F., Shou, Y., Meng, T., Chen, H., & Li, K. (2025). Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum. Proceedings of the AAAI Conference on Artificial Intelligence, 39(11), 11418–11426. https://doi.org/10.1609/aaai.v39i11.33242

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management I