DEQA: Descriptions Enhanced Question-Answering Framework for Multimodal Aspect-Based Sentiment Analysis

Authors

  • Zhixin Han College of Software, Nankai University
  • Mengting Hu College of Software, Nankai University
  • Yinhao Bai JD AI Research, Beijing, China
  • Xunzhi Wang College of Software, Nankai University
  • Bitong Luo College of Software, Nankai University

DOI:

https://doi.org/10.1609/aaai.v39i22.34572

Abstract

Multimodal aspect-based sentiment analysis (MABSA) integrates text and images to perform fine-grained sentiment analysis on specific aspects, enhancing the understanding of user opinions in various applications. Existing methods use modality alignment for information interaction and fusion between images and text, but an inherent gap between these two modalities necessitates a more direct bridging mechanism to effectively connect image understanding with text content. For this, we propose the Descriptions Enhanced Question-Answering Framework (DEQA), which generates descriptions of images using GPT-4, leveraging the multimodal large language model to provide more direct semantic context of images. In DEQA, to help the model better understand the task's purpose, we frame MABSA as a multi-turn question-answering problem to add semantic guidance and hints. We input text, image, and description into separate experts in various combinations, allowing each expert to focus on different features and thereby improving the comprehensive utilization of input information. By integrating these expert outputs within a multi-turn question-answering format, we employ a multi-expert ensemble decision-making approach to produce the final prediction results. Experimental results on two widely-used datasets demonstrate that our method achieves state-of-the-art performance. Furthermore, our framework substantially outperforms GPT-4o and other multimodal large language models, showcasing its superior effectiveness in multimodal sentiment analysis.

Downloads

Published

2025-04-11

How to Cite

Han, Z., Hu, M., Bai, Y., Wang, X., & Luo, B. (2025). DEQA: Descriptions Enhanced Question-Answering Framework for Multimodal Aspect-Based Sentiment Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 39(22), 23987–23995. https://doi.org/10.1609/aaai.v39i22.34572

Issue

Section

AAAI Technical Track on Natural Language Processing I