G^2SAM: Graph-Based Global Semantic Awareness Method for Multimodal Sarcasm Detection
DOI:
https://doi.org/10.1609/aaai.v38i8.28766Keywords:
DMKM: Mining of Visual, Multimedia & Multimodal Data, CV: Language and Vision, CV: Multi-modal Vision, KRR: ApplicationsAbstract
Multimodal sarcasm detection, aiming to detect the ironic sentiment within multimodal social data, has gained substantial popularity in both the natural language processing and computer vision communities. Recently, graph-based studies by drawing sentimental relations to detect multimodal sarcasm have made notable advancements. However, they have neglected exploiting graph-based global semantic congruity from existing instances to facilitate the prediction, which ultimately hinders the model's performance. In this paper, we introduce a new inference paradigm that leverages global graph-based semantic awareness to handle this task. Firstly, we construct fine-grained multimodal graphs for each instance and integrate them into semantic space to draw graph-based relations. During inference, we leverage global semantic congruity to retrieve k-nearest neighbor instances in semantic space as references for voting on the final prediction. To enhance the semantic correlation of representation in semantic space, we also introduce label-aware graph contrastive learning to further improve the performance. Experimental results demonstrate that our model achieves state-of-the-art (SOTA) performance in multimodal sarcasm detection. The code will be available at https://github.com/upccpu/G2SAM.Downloads
Published
2024-03-24
How to Cite
Wei, Y., Yuan, S., Zhou, H., Wang, L., Yan, Z., Yang, R., & Chen, M. (2024). G^2SAM: Graph-Based Global Semantic Awareness Method for Multimodal Sarcasm Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 9151-9159. https://doi.org/10.1609/aaai.v38i8.28766
Issue
Section
AAAI Technical Track on Data Mining & Knowledge Management