MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models

Xilin He; Haijian Liang; Boyi Peng; Weicheng Xie; Muhammad Haris Khan; Siyang Song; Zitong Yu

doi:10.1609/aaai.v39i2.32120

Authors

Xilin He Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University
Haijian Liang Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University
Boyi Peng Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University
Weicheng Xie Computer Vision Institute, School of Computer Science & Software Engineering, Shenzhen University Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen Guangdong Provincial Key Laboratory of Intelligent Information Processing
Muhammad Haris Khan Mohamed Bin Zayed University of Artificial Intelligence
Siyang Song University of Exeter
Zitong Yu Great Bay University

DOI:

https://doi.org/10.1609/aaai.v39i2.32120

Abstract

Multimodal sentiment analysis, which learns a model to process multiple modalities simultaneously and predict a sentiment value, is an important area of affective computing. Modeling sequential intra-modal information and enhancing cross-modal interactions are crucial to multimodal sentiment analysis. In this paper, we propose MSAmba, a novel hybrid Mamba-based architecture for multimodal sentiment analysis, consisting of two core blocks: Intra-Modal Sequential Mamba (ISM) block and Cross-Modal Hybrid Mamba (CHM) block, to comprehensively address the above-mentioned challenges with hybrid state space models. Firstly, the ISM block models the sequential information within each modality in a bi-directional manner with the assistance of global information. Subsequently, the CHM blocks explicitly model centralized cross-modal interaction with a hybrid combination of Mamba and attention mechanism to facilitate information fusion across modalities. Finally, joint learning of the intra-modal tokens and cross-modal tokens is utilized to predict the sentiment values. This paper serves as one of the pioneering works to unravel the outstanding performances and great research potential of Mamba-based methods in the task of multimodal sentiment analysis. Experiments on CMU-MOSI, CMU-MOSEI and CH-SIMS demonstrate the superior performance of the proposed MSAmba over prior Transformer-based and CNN-based methods.

MSAmba: Exploring Multimodal Sentiment Analysis with State Space Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information