Mitigating Entity Hallucinations in 3D Radiology Report Generation via Dual-Stream Alignment

Lingyu Zhou; Yue Yu; Zhang Yi; Xiuyuan Xu

doi:10.1609/aaai.v40i16.38379

Authors

Lingyu Zhou Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China
Yue Yu Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China
Zhang Yi Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China
Xiuyuan Xu Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu, China

DOI:

https://doi.org/10.1609/aaai.v40i16.38379

Abstract

Entity hallucination poses a major challenge in radiology report generation (RRG), particularly for 3D CT scans where complex spatial contexts amplify factual errors. To address this, medical entity phrases serve as key carriers for multi-modal prompting, integrating expert knowledge into the vision-language model. Current methods use unified cross-attention for volume-phrase alignment, failing to account for anatomical specificity during the alignment process. In this work, we introduce the Dual-stream Entity Alignment Reporting network (DEAR) that separately models organ and lesion entities to resolve anatomical bias. Specifically, the dual-stream entity aligner is designed to partition medical entity phrases into organ and lesion streams, feeding them into separate cross-attention blocks in parallel to achieve fine-grained volume–phrase alignment. For structurally regular and spatially stable organ entities, an organ-guided cross-attention (OGCA) block is proposed to enforce structural consistency by retrieving the top-k voxel tokens via volume–phrase similarity and preserving spatial connectivity through morphological dilation. Meanwhile, a lesion-guided cross-attention (LGCA) block is introduced for structurally irregular and spatially variable lesion entities, enhancing anomaly sensitivity through phrase-weighted attention and refining discriminative boundaries via 3D residual Laplacian filtering. Experiments demonstrate that DEAR significantly reduces entity hallucinations and improves clinical factuality in 3D RRG benchmarks.

Mitigating Entity Hallucinations in 3D Radiology Report Generation via Dual-Stream Alignment

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information