MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision

Authors

  • Zhonghao Yan Beijing University of Posts and Telecommunications
  • Muxi Diao Beijing University of Posts and Telecommunications Zhongguancun Academy
  • Yuxuan Yang Beijing University of Posts and Telecommunications
  • Ruoyan Jing Beijing University of Posts and Telecommunications
  • Jiayuan Xu Beijing University of Posts and Telecommunications
  • Kaizhou Zhang Beijing University of Posts and Telecommunications
  • Lele Yang Beijing University of Posts and Telecommunications
  • Yanxi Liu Beijing Information Science and Technology University
  • Kongming Liang Beijing University of Posts and Telecommunications
  • Zhanyu Ma Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v40i14.38141

Abstract

Accurately grounding regions of interest (ROIs) is critical for diagnosis and treatment planning in medical imaging. While multimodal large language models (MLLMs) combine visual perception with natural language, current medical-grounding pipelines still rely on supervised fine-tuning with explicit spatial hints, making them ill-equipped to handle the implicit queries common in clinical practice. This work makes three core contributions. We first define Unified Medical Reasoning Grounding (UMRG), a novel vision–language task that demands clinical reasoning and pixel-level grounding. Second, we release U-MRG-14K, a dataset of 14K samples featuring pixel-level masks alongside implicit clinical queries and reasoning traces, spanning 10 modalities, 15 super-categories, and 108 specific categories. Finally, we introduce MedReasoner, a modular framework that distinctly separates reasoning from segmentation: an MLLM reasoner is optimized with reinforcement learning, while a frozen segmentation expert converts spatial prompts into masks, with alignment achieved through format and accuracy rewards. MedReasoner achieves state-of-the-art performance on U-MRG-14K and demonstrates strong generalization to unseen clinical queries, underscoring the significant promise of reinforcement learning for interpretable medical grounding.

Downloads

Published

2026-03-14

How to Cite

Yan, Z., Diao, M., Yang, Y., Jing, R., Xu, J., Zhang, K., Yang, L., Liu, Y., Liang, K., & Ma, Z. (2026). MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision. Proceedings of the AAAI Conference on Artificial Intelligence, 40(14), 11577-11585. https://doi.org/10.1609/aaai.v40i14.38141

Issue

Section

AAAI Technical Track on Computer Vision XI