M3UCD: A Multi-task Multimodal Metaphor Understanding Challenge Dataset for LLMs

Authors

  • Tianlong Zheng Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi, China University of Chinese Academy of Sciences, Beijing, China Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
  • Yating Yang Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi, China University of Chinese Academy of Sciences, Beijing, China Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
  • Rui Dong Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi, China University of Chinese Academy of Sciences, Beijing, China Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
  • Bo Ma Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi, China University of Chinese Academy of Sciences, Beijing, China Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
  • Lei Wang Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi, China University of Chinese Academy of Sciences, Beijing, China Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
  • Xi Zhou Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi, China University of Chinese Academy of Sciences, Beijing, China Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
  • Siru Miao Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi, China University of Chinese Academy of Sciences, Beijing, China Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China
  • Turghun Osman Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi, China University of Chinese Academy of Sciences, Beijing, China Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi, China

DOI:

https://doi.org/10.1609/aaai.v40i41.40808

Abstract

Understanding multimodal metaphors represents a crucial pathway for machines to comprehend human cognition. However, current research remains constrained by superficial dataset annotations, insufficient systematic evaluation of large language models, and fragmented task frameworks. To bridge these gaps, the paper proposes a systematic solution featuring: (I) We present the largest fine-grained Multi-task Multimodal Metaphor Understanding Challenge Dataset (M3UCD) built via multi-perspective collaborative annotation. It contains 15,345 samples, each annotated with 12 manual attribute labels. (II) Systematic benchmarking of LLMs' capacity boundaries in metaphor understanding. Evaluation results reveal the persistent challenges LLMs face in this domain while validating M3UCD's effectiveness and potential. (III) A concise and unified multi-task baseline framework was developed and demonstrated its effectiveness in enhancing the metaphor understanding capabilities of MLLMs.

Downloads

Published

2026-03-14

How to Cite

Zheng, T., Yang, Y., Dong, R., Ma, B., Wang, L., Zhou, X., … Osman, T. (2026). M3UCD: A Multi-task Multimodal Metaphor Understanding Challenge Dataset for LLMs. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 35030–35040. https://doi.org/10.1609/aaai.v40i41.40808

Issue

Section

AAAI Technical Track on Natural Language Processing VI