Mutual-Enhanced Incongruity Learning Network for Multi-Modal Sarcasm Detection
Keywords:ML: Multimodal Learning, SNLP: Sentiment Analysis and Stylistic Analysis
AbstractSarcasm is a sophisticated linguistic phenomenon that is prevalent on today's social media platforms. Multi-modal sarcasm detection aims to identify whether a given sample with multi-modal information (i.e., text and image) is sarcastic. This task's key lies in capturing both inter- and intra-modal incongruities within the same context. Although existing methods have achieved compelling success, they are disturbed by irrelevant information extracted from the whole image and text, or overlooking some important information due to the incomplete input. To address these limitations, we propose a Mutual-enhanced Incongruity Learning Network for multi-modal sarcasm detection, named MILNet. In particular, we design a local semantic-guided incongruity learning module and a global incongruity learning module. Moreover, we introduce a mutual enhancement module to take advantage of the underlying consistency between the two modules to boost the performance. Extensive experiments on a widely-used dataset demonstrate the superiority of our model over cutting-edge methods.
How to Cite
Qiao, Y., Jing, L., Song, X., Chen, X., Zhu, L., & Nie, L. (2023). Mutual-Enhanced Incongruity Learning Network for Multi-Modal Sarcasm Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9507-9515. https://doi.org/10.1609/aaai.v37i8.26138
AAAI Technical Track on Machine Learning III