Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models

Authors

  • Zehao Wang Shanghai Jiao Tong University
  • Xinpeng Liu Shanghai Jiao Tong University Shanghai Innovation Institute
  • Yudonglin Zhang Shanghai Jiao Tong University
  • Xiaoqian Wu Shanghai Jiao Tong University
  • Zhou Fang Shanghai Jiao Tong University
  • Yifan Fang Shanghai Jiao Tong University
  • Junfu Pu ARC Lab, Tencent PCG
  • Cewu Lu Shanghai Jiao Tong University Shanghai Innovation Institute
  • Yong-Lu Li Shanghai Jiao Tong University Shanghai Innovation Institute

DOI:

https://doi.org/10.1609/aaai.v40i12.38005

Abstract

Multimodal Large Language Models (MLLMs) have garnered significant attention recently and demonstrate outstanding capabilities in various tasks such as OCR, VQA, captioning, etc. However, hallucination remains a persistent issue. While numerous methods have been proposed to mitigate hallucinations, achieving notable improvements, these methods primarily focus on mitigating hallucinations related to object/noun concepts. Verb concepts, which are crucial for understanding human actions, have been largely overlooked. In this paper, to the best of our knowledge, we are the first to investigate the verb hallucination phenomenon of MLLMs from various perspectives. Our findings reveal that most state-of-the-art MLLMs suffer from severe verb hallucination. To assess the effectiveness of existing mitigation methods for object concept hallucination in relation to verb hallucination, we evaluated these methods and found that they do not effectively address verb hallucination. To address this issue, we propose a baseline method based on fine-tuning with rich verb knowledge, achieving decent superiority. The experiment results demonstrate that our method significantly reduces hallucinations related to verbs.

Downloads

Published

2026-03-14

How to Cite

Wang, Z., Liu, X., Zhang, Y., Wu, X., Fang, Z., Fang, Y., … Li, Y.-L. (2026). Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10349–10357. https://doi.org/10.1609/aaai.v40i12.38005

Issue

Section

AAAI Technical Track on Computer Vision IX