Bootstrapping Large Language Models for Radiology Report Generation

Authors

  • Chang Liu University of Science and Technology of China
  • Yuanhe Tian University of Washington
  • Weidong Chen University of Science and Technology of China
  • Yan Song University of Science and Technology of China
  • Yongdong Zhang University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v38i17.29826

Keywords:

NLP: Applications, NLP: Language Grounding & Multi-modal NLP

Abstract

Radiology report generation (RRG) aims to automatically generate a free-text description from a specific clinical radiograph, e.g., chest X-Ray images. Existing approaches tend to perform RRG with specific models trained on the public yet limited data from scratch, where they often lead to inferior performance owing to the problem of inefficient capabilities in both aligning visual and textual features and generating informative reports accordingly. Currently, large language models (LLMs) offered a promising solution to text generation with their power in learning from big data, especially for cross-modal scenarios such as RRG. However, most existing LLMs are pre-trained on general data, and suffer from the same problem of conventional approaches caused by knowledge gap between general and medical domain if they are applied to RRG. Therefore in this paper, we propose an approach to bootstrapping LLMs for RRG with a in-domain instance induction and a coarse-to-fine decoding process. Specifically, the in-domain instance induction process learns to align the LLM to radiology reports from general texts through contrastive learning. The coarse-to-fine decoding performs a text elevating process for those reports from the ranker, further enhanced with visual features and refinement prompts. Experimental results on two prevailing RRG datasets, namely, IU X-Ray and MIMIC-CXR, demonstrate the superiority of our approach to previous state-of-the-art solutions. Further analyses illustrate that, for the LLM, the induction process enables it to better align with the medical domain and the coarse-to-fine generation allows it to conduct more precise text generation.

Published

2024-03-24

How to Cite

Liu, C., Tian, Y., Chen, W., Song, Y., & Zhang, Y. (2024). Bootstrapping Large Language Models for Radiology Report Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18635-18643. https://doi.org/10.1609/aaai.v38i17.29826

Issue

Section

AAAI Technical Track on Natural Language Processing II