LLM-RG4: Flexible and Factual Radiology Report Generation Across Diverse Input Contexts

Authors

  • Zhuhao Wang School of Biomedical Engineering, Tsinghua University, Beijing, China
  • Yihua Sun School of Biomedical Engineering, Tsinghua University, Beijing, China
  • Zihan Li School of Biomedical Engineering, Tsinghua University, Beijing, China
  • Xuan Yang School of Biomedical Engineering, Tsinghua University, Beijing, China
  • Fang Chen School of Biomedical Engineering, and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China
  • Hongen Liao School of Biomedical Engineering, Tsinghua University, Beijing, China School of Biomedical Engineering, and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China

DOI:

https://doi.org/10.1609/aaai.v39i8.32890

Abstract

Drafting radiology reports is a complex task requiring flexibility, where radiologists tail content to available information and particular clinical demands. However, most current radiology report generation (RRG) models are constrained to a fixed task paradigm, such as predicting the full ''finding'' section from a single image, inherently involving a mismatch between inputs and outputs. The trained models lack the flexibility for diverse inputs and could generate harmful, input-agnostic hallucinations. To bridge the gap between current RRG models and the clinical demands in practice, we first develop a data generation pipeline to create a new MIMIC-RG4 dataset, which considers four common radiology report drafting scenarios and has perfectly corresponded input and output. Secondly, we propose a novel large language model (LLM) based RRG framework, namely LLM-RG4, which utilizes LLM's flexible instruction-following capabilities and extensive general knowledge. We further develop an adaptive token fusion module that offers flexibility to handle diverse scenarios with different input combinations, while minimizing the additional computational burden associated with increased input volumes. Besides, we propose a token-level loss weighting strategy to direct the model's attention towards positive and uncertain descriptions. Experimental results demonstrate that LLM-RG4 achieves state-of-the-art performance in both clinical efficiency and natural language generation on the MIMIC-RG4 and MIMIC-CXR datasets. We quantitatively demonstrate that our model has minimal input-agnostic hallucinations, whereas current open-source models commonly suffer from this problem.

Published

2025-04-11

How to Cite

Wang, Z., Sun, Y., Li, Z., Yang, X., Chen, F., & Liao, H. (2025). LLM-RG4: Flexible and Factual Radiology Report Generation Across Diverse Input Contexts. Proceedings of the AAAI Conference on Artificial Intelligence, 39(8), 8250–8258. https://doi.org/10.1609/aaai.v39i8.32890

Issue

Section

AAAI Technical Track on Computer Vision VII