Mutual-Enhanced Incongruity Learning Network for Multi-Modal Sarcasm Detection


  • Yang Qiao Shandong University
  • Liqiang Jing Shandong University
  • Xuemeng Song Shandong University
  • Xiaolin Chen Shandong University
  • Lei Zhu Shandong Normal Unversity
  • Liqiang Nie Harbin Institute of Technology (Shenzhen)



ML: Multimodal Learning, SNLP: Sentiment Analysis and Stylistic Analysis


Sarcasm is a sophisticated linguistic phenomenon that is prevalent on today's social media platforms. Multi-modal sarcasm detection aims to identify whether a given sample with multi-modal information (i.e., text and image) is sarcastic. This task's key lies in capturing both inter- and intra-modal incongruities within the same context. Although existing methods have achieved compelling success, they are disturbed by irrelevant information extracted from the whole image and text, or overlooking some important information due to the incomplete input. To address these limitations, we propose a Mutual-enhanced Incongruity Learning Network for multi-modal sarcasm detection, named MILNet. In particular, we design a local semantic-guided incongruity learning module and a global incongruity learning module. Moreover, we introduce a mutual enhancement module to take advantage of the underlying consistency between the two modules to boost the performance. Extensive experiments on a widely-used dataset demonstrate the superiority of our model over cutting-edge methods.




How to Cite

Qiao, Y., Jing, L., Song, X., Chen, X., Zhu, L., & Nie, L. (2023). Mutual-Enhanced Incongruity Learning Network for Multi-Modal Sarcasm Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9507-9515.



AAAI Technical Track on Machine Learning III