G^2SAM: Graph-Based Global Semantic Awareness Method for Multimodal Sarcasm Detection

Yiwei Wei; Shaozu Yuan; Hengyang Zhou; Longbiao Wang; Zhiling Yan; Ruosong Yang; Meng Chen

doi:10.1609/aaai.v38i8.28766

Authors

Yiwei Wei Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University China University of Petroleum(Beijing) at Karamay
Shaozu Yuan JD AI Research
Hengyang Zhou China University of Petroleum(Beijing) at Karamay
Longbiao Wang Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University Huiyan Technology (Tianjin) Co., Ltd
Zhiling Yan JD AI Research
Ruosong Yang JD AI Research
Meng Chen Yep AI

DOI:

https://doi.org/10.1609/aaai.v38i8.28766

Keywords:

DMKM: Mining of Visual, Multimedia & Multimodal Data, CV: Language and Vision, CV: Multi-modal Vision, KRR: Applications

Abstract

Multimodal sarcasm detection, aiming to detect the ironic sentiment within multimodal social data, has gained substantial popularity in both the natural language processing and computer vision communities. Recently, graph-based studies by drawing sentimental relations to detect multimodal sarcasm have made notable advancements. However, they have neglected exploiting graph-based global semantic congruity from existing instances to facilitate the prediction, which ultimately hinders the model's performance. In this paper, we introduce a new inference paradigm that leverages global graph-based semantic awareness to handle this task. Firstly, we construct fine-grained multimodal graphs for each instance and integrate them into semantic space to draw graph-based relations. During inference, we leverage global semantic congruity to retrieve k-nearest neighbor instances in semantic space as references for voting on the final prediction. To enhance the semantic correlation of representation in semantic space, we also introduce label-aware graph contrastive learning to further improve the performance. Experimental results demonstrate that our model achieves state-of-the-art (SOTA) performance in multimodal sarcasm detection. The code will be available at https://github.com/upccpu/G2SAM.

G^2SAM: Graph-Based Global Semantic Awareness Method for Multimodal Sarcasm Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information