M3UCD: A Multi-task Multimodal Metaphor Understanding Challenge Dataset for LLMs
DOI:
https://doi.org/10.1609/aaai.v40i41.40808Abstract
Understanding multimodal metaphors represents a crucial pathway for machines to comprehend human cognition. However, current research remains constrained by superficial dataset annotations, insufficient systematic evaluation of large language models, and fragmented task frameworks. To bridge these gaps, the paper proposes a systematic solution featuring: (I) We present the largest fine-grained Multi-task Multimodal Metaphor Understanding Challenge Dataset (M3UCD) built via multi-perspective collaborative annotation. It contains 15,345 samples, each annotated with 12 manual attribute labels. (II) Systematic benchmarking of LLMs' capacity boundaries in metaphor understanding. Evaluation results reveal the persistent challenges LLMs face in this domain while validating M3UCD's effectiveness and potential. (III) A concise and unified multi-task baseline framework was developed and demonstrated its effectiveness in enhancing the metaphor understanding capabilities of MLLMs.Published
2026-03-14
How to Cite
Zheng, T., Yang, Y., Dong, R., Ma, B., Wang, L., Zhou, X., … Osman, T. (2026). M3UCD: A Multi-task Multimodal Metaphor Understanding Challenge Dataset for LLMs. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 35030–35040. https://doi.org/10.1609/aaai.v40i41.40808
Issue
Section
AAAI Technical Track on Natural Language Processing VI