MetaGPT: A Large Vision-Language Model for Meme Metaphor Understanding
DOI:
https://doi.org/10.1609/aaai.v40i19.38638Abstract
Meme is an expressive medium that often conveys rich emotions and intentions. Recent studies have confirmed the critical role of metaphors in meme understanding. However, existing metaphor research heavily relies on manual annotations, and mainstream vision-language models (VLMs) still struggle with the recognition and comprehension of metaphors. To address these challenges, we introduce MetaGPT, the first vision-language model specifically designed for meme metaphor understanding. MetaGPT is capable of identifying and extracting metaphors in memes, and generating accurate meme interpretations. Furthermore, we construct a dedicated dataset for meme understanding, MUnd, which comprises approximately 32,000 high-quality question-answer (QA) pairs across three core tasks: metaphor detection, metaphor domain extraction, and meme interpretation. Based on MUnd, we further propose an evaluation benchmark for meme understanding and conduct a comprehensive assessment of existing VLMs. Experimental results reveal that current models still face challenges in metaphor comprehension, while MetaGPT consistently outperforms them across all tasks, highlighting its potential in advancing meme understanding.Published
2026-03-14
How to Cite
Xu, B., Wang, C., Chen, X., Lin, H., & Xia, F. (2026). MetaGPT: A Large Vision-Language Model for Meme Metaphor Understanding. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16040–16048. https://doi.org/10.1609/aaai.v40i19.38638
Issue
Section
AAAI Technical Track on Data Mining & Knowledge Management III