MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models
DOI:
https://doi.org/10.1609/aaai.v39i2.32120Abstract
Multimodal sentiment analysis, which learns a model to process multiple modalities simultaneously and predict a sentiment value, is an important area of affective computing. Modeling sequential intra-modal information and enhancing cross-modal interactions are crucial to multimodal sentiment analysis. In this paper, we propose MSAmba, a novel hybrid Mamba-based architecture for multimodal sentiment analysis, consisting of two core blocks: Intra-Modal Sequential Mamba (ISM) block and Cross-Modal Hybrid Mamba (CHM) block, to comprehensively address the above-mentioned challenges with hybrid state space models. Firstly, the ISM block models the sequential information within each modality in a bi-directional manner with the assistance of global information. Subsequently, the CHM blocks explicitly model centralized cross-modal interaction with a hybrid combination of Mamba and attention mechanism to facilitate information fusion across modalities. Finally, joint learning of the intra-modal tokens and cross-modal tokens is utilized to predict the sentiment values. This paper serves as one of the pioneering works to unravel the outstanding performances and great research potential of Mamba-based methods in the task of multimodal sentiment analysis. Experiments on CMU-MOSI, CMU-MOSEI and CH-SIMS demonstrate the superior performance of the proposed MSAmba over prior Transformer-based and CNN-based methods.Downloads
Published
2025-04-11
How to Cite
He, X., Liang, H., Peng, B., Xie, W., Khan, M. H., Song, S., & Yu, Z. (2025). MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(2), 1309–1317. https://doi.org/10.1609/aaai.v39i2.32120
Issue
Section
AAAI Technical Track on Cognitive Modeling & Cognitive Systems