MedGR2: Breaking the Data Barrier for Medical Reasoning via Generative Reward Learning

Weihai Zhi; Jiayan Guo; Shangyang Li

doi:10.1609/aaai.v40i34.40125

Authors

Weihai Zhi Guangdong Institute of Intelligence Science and Technology, Zhuhai, China
Jiayan Guo School of Intelligence Science and Technology, Peking University, Beijing, China
Shangyang Li Guangdong Institute of Intelligence Science and Technology, Zhuhai, China

DOI:

https://doi.org/10.1609/aaai.v40i34.40125

Abstract

The application of vision-language models in medicine is critically hampered by the scarcity of high-quality, expert-annotated data. Supervised fine-tuning on existing datasets often leads to poor generalization on unseen modalities and tasks, while reinforcement learning, a promising alternative, is stymied by the lack of reliable reward signals in this data-scarce domain. To address this challenge, we propose a Generative Reward Learning framework that establishes a self-improving training cycle. The framework jointly develops a data generator and a reward model, enabling the automated and continuous creation of high-quality multimodal medical data that serves as an effective training source for post-training. Our experiments demonstrate that supervised fine-tuning using the generated data already surpasses models trained on large-scale human-curated datasets. More importantly, when the generated data is further leveraged for reinforcement learning via Group Relative Policy Optimization, the resulting model achieves state-of-the-art cross-modality and cross-task generalization, significantly outperforming specialized reinforcement-learning-based methods. Notably, a compact model trained under this framework attains performance competitive with foundation models containing more than an order of magnitude more parameters. These results suggest a new paradigm for data-efficient learning in high-stakes medical domains, shifting the bottleneck from data scarcity to data generation and unlocking the potential of reinforcement learning for building robust and generalizable medical AI systems.

MedGR2: Breaking the Data Barrier for Medical Reasoning via Generative Reward Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information