MIRA: Evaluating Multimodal AI on Complex Clinical Reasoning in Interventional Radiology

Authors

  • Jingxiong Li School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Chenglu Zhu School of Engineering, Westlake University, Hangzhou 310024, China
  • Sunyi Zheng Tianjin Medical University Cancer Institute and Hospital, Department of Radiology, Tianjin 300060, China
  • Yuxuan Sun School of Engineering, Westlake University, Hangzhou 310024, China
  • Yifei Wang Nanjing First Hospital, Nanjing Medical University, Nanjing 210006, China
  • He Liu Department of Cardiology, Xuzhou Central Hospital, Xuzhou 221000, China
  • Yunlong Zhang School of Engineering, Westlake University, Hangzhou 310024, China
  • Yixuan Si School of Engineering, Westlake University, Hangzhou 310024, China
  • Lin Yang School of Engineering, Westlake University, Hangzhou 310024, China
  • Liang Xiao School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing 210094, China

DOI:

https://doi.org/10.1609/aaai.v40i8.37549

Abstract

We present MIRA (Multimodal Interventional RAdiology evaluation), a comprehensive benchmark for evaluating large multimodal models in expert-level interventional radiology tasks requiring specialized domain knowledge and advanced visual reasoning capabilities. Unlike existing medical benchmarks that primarily provide binary labels without contextual depth, MIRA offers diverse question formats, including open-ended, closed-ended, single-choice, and multiple-choice categories, each accompanied by detailed expert-validated explanations. The benchmark incorporates approximately 184K high-quality medical images spanning multiple imaging modalities with 1.2M meticulously generated question-answer pairs across various anatomical regions. These pairs were created through a sophisticated cascade methodology involving expert interventional radiologists at both the data collection and validation stages. Our comprehensive evaluation, encompassing zero-shot testing and fine-tuning experiments of large multimodal models, revealing significant performance gaps between AI systems and human specialists. Fine-tuning experiments demonstrate substantial improvements, with models achieving up to 0.80 accuracy on single-choice questions. MIRA establishes a challenging benchmark that suggests promising directions for developing specialized clinical AI systems for interventional radiology.

Downloads

Published

2026-03-14

How to Cite

Li, J., Zhu, C., Zheng, S., Sun, Y., Wang, Y., Liu, H., … Xiao, L. (2026). MIRA: Evaluating Multimodal AI on Complex Clinical Reasoning in Interventional Radiology. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 6235–6243. https://doi.org/10.1609/aaai.v40i8.37549

Issue

Section

AAAI Technical Track on Computer Vision V