Bootstrapping Large Language Models for Radiology Report Generation

Chang Liu; Yuanhe Tian; Weidong Chen; Yan Song; Yongdong Zhang

doi:10.1609/aaai.v38i17.29826

Authors

Chang Liu University of Science and Technology of China
Yuanhe Tian University of Washington
Weidong Chen University of Science and Technology of China
Yan Song University of Science and Technology of China
Yongdong Zhang University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v38i17.29826

Keywords:

NLP: Applications, NLP: Language Grounding & Multi-modal NLP

Abstract

Radiology report generation (RRG) aims to automatically generate a free-text description from a specific clinical radiograph, e.g., chest X-Ray images. Existing approaches tend to perform RRG with specific models trained on the public yet limited data from scratch, where they often lead to inferior performance owing to the problem of inefficient capabilities in both aligning visual and textual features and generating informative reports accordingly. Currently, large language models (LLMs) offered a promising solution to text generation with their power in learning from big data, especially for cross-modal scenarios such as RRG. However, most existing LLMs are pre-trained on general data, and suffer from the same problem of conventional approaches caused by knowledge gap between general and medical domain if they are applied to RRG. Therefore in this paper, we propose an approach to bootstrapping LLMs for RRG with a in-domain instance induction and a coarse-to-fine decoding process. Specifically, the in-domain instance induction process learns to align the LLM to radiology reports from general texts through contrastive learning. The coarse-to-fine decoding performs a text elevating process for those reports from the ranker, further enhanced with visual features and refinement prompts. Experimental results on two prevailing RRG datasets, namely, IU X-Ray and MIMIC-CXR, demonstrate the superiority of our approach to previous state-of-the-art solutions. Further analyses illustrate that, for the LLM, the induction process enables it to better align with the medical domain and the coarse-to-fine generation allows it to conduct more precise text generation.

Bootstrapping Large Language Models for Radiology Report Generation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription