MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

Authors

  • Yanxu Zhu State Key Laboratory of Advanced Rail Autonomous Operation, Beijing Jiaotong University School of Computer Science and Technology, Beijing Jiaotong University
  • Shitong Duan College of Computer Science and Artificial Intelligence, Fudan University
  • Xiangxu Zhang Gaoling School of Artificial Intelligence, Renmin University of China
  • Jitao Sang State Key Laboratory of Advanced Rail Autonomous Operation, Beijing Jiaotong University School of Computer Science and Technology, Beijing Jiaotong University
  • Peng Zhang College of Computer Science and Artificial Intelligence, Fudan University
  • Tun Lu College of Computer Science and Artificial Intelligence, Fudan University
  • Xiao Zhou Gaoling School of Artificial Intelligence, Renmin University of China
  • Jing Yao Microsoft Research Asia
  • Xiaoyuan Yi Microsoft Research Asia
  • Xing Xie Microsoft Research Asia

DOI:

https://doi.org/10.1609/aaai.v40i34.40159

Abstract

Recently Multimodal Large Language Models (MLLMs) have achieved considerable advancements in vision-language tasks, yet produce potentially harmful or untrustworthy content. Despite substantial work investigating the trustworthiness of language models, MMLMs' capability to act honestly, especially when faced with visually unanswerable questions, remains largely underexplored. This work presents the first systematic assessment of honesty behaviors across various MLLMs. We ground honesty in models' response behaviors to unanswerable visual questions, define four representative types of such questions, and construct MoHoBench, a large-scale MMLM honest benchmark, consisting of 12k+ visual question samples, whose quality is guaranteed by multi-stage filtering and human verification. Using MoHoBench, we benchmarked the honesty of 28 popular MMLMs and conducted a comprehensive analysis. Our findings show that: (1) most models fail to appropriately refuse to answer when necessary, and (2) MMLMs' honesty is not solely a language modeling issue, but is deeply influenced by visual information, necessitating the development of dedicated methods for multimodal honesty alignment. Therefore, we implemented initial alignment methods using supervised and preference learning to improve honesty behavior, providing a foundation for future work on trustworthy MLLMs.

Published

2026-03-14

How to Cite

Zhu, Y., Duan, S., Zhang, X., Sang, J., Zhang, P., Lu, T., … Xie, X. (2026). MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions. Proceedings of the AAAI Conference on Artificial Intelligence, 40(34), 29205–29213. https://doi.org/10.1609/aaai.v40i34.40159

Issue

Section

AAAI Technical Track on Machine Learning XI