MetaGPT: A Large Vision-Language Model for Meme Metaphor Understanding

Bo Xu; Chenyuan Wang; Xinyu Chen; Hongfei Lin; Feng Xia

doi:10.1609/aaai.v40i19.38638

Authors

Bo Xu Dalian University of Technology
Chenyuan Wang Dalian University of Technology
Xinyu Chen Dalian University of Technology
Hongfei Lin Dalian University of Technology
Feng Xia Royal Melbourne Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i19.38638

Abstract

Meme is an expressive medium that often conveys rich emotions and intentions. Recent studies have confirmed the critical role of metaphors in meme understanding. However, existing metaphor research heavily relies on manual annotations, and mainstream vision-language models (VLMs) still struggle with the recognition and comprehension of metaphors. To address these challenges, we introduce MetaGPT, the first vision-language model specifically designed for meme metaphor understanding. MetaGPT is capable of identifying and extracting metaphors in memes, and generating accurate meme interpretations. Furthermore, we construct a dedicated dataset for meme understanding, MUnd, which comprises approximately 32,000 high-quality question-answer (QA) pairs across three core tasks: metaphor detection, metaphor domain extraction, and meme interpretation. Based on MUnd, we further propose an evaluation benchmark for meme understanding and conduct a comprehensive assessment of existing VLMs. Experimental results reveal that current models still face challenges in metaphor comprehension, while MetaGPT consistently outperforms them across all tasks, highlighting its potential in advancing meme understanding.

MetaGPT: A Large Vision-Language Model for Meme Metaphor Understanding

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information