MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models

Authors

  • Xilin He Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University
  • Haijian Liang Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University
  • Boyi Peng Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University
  • Weicheng Xie Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen Guangdong Provincial Key Laboratory of Intelligent Information Processing
  • Muhammad Haris Khan Mohamed Bin Zayed University of Artificial Intelligence
  • Siyang Song University of Exeter
  • Zitong Yu Great Bay University

DOI:

https://doi.org/10.1609/aaai.v39i2.32120

Abstract

Multimodal sentiment analysis, which learns a model to process multiple modalities simultaneously and predict a sentiment value, is an important area of affective computing. Modeling sequential intra-modal information and enhancing cross-modal interactions are crucial to multimodal sentiment analysis. In this paper, we propose MSAmba, a novel hybrid Mamba-based architecture for multimodal sentiment analysis, consisting of two core blocks: Intra-Modal Sequential Mamba (ISM) block and Cross-Modal Hybrid Mamba (CHM) block, to comprehensively address the above-mentioned challenges with hybrid state space models. Firstly, the ISM block models the sequential information within each modality in a bi-directional manner with the assistance of global information. Subsequently, the CHM blocks explicitly model centralized cross-modal interaction with a hybrid combination of Mamba and attention mechanism to facilitate information fusion across modalities. Finally, joint learning of the intra-modal tokens and cross-modal tokens is utilized to predict the sentiment values. This paper serves as one of the pioneering works to unravel the outstanding performances and great research potential of Mamba-based methods in the task of multimodal sentiment analysis. Experiments on CMU-MOSI, CMU-MOSEI and CH-SIMS demonstrate the superior performance of the proposed MSAmba over prior Transformer-based and CNN-based methods.

Downloads

Published

2025-04-11

How to Cite

He, X., Liang, H., Peng, B., Xie, W., Khan, M. H., Song, S., & Yu, Z. (2025). MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(2), 1309–1317. https://doi.org/10.1609/aaai.v39i2.32120

Issue

Section

AAAI Technical Track on Cognitive Modeling & Cognitive Systems