Mitigating Entity Hallucinations in 3D Radiology Report Generation via Dual-Stream Alignment
DOI:
https://doi.org/10.1609/aaai.v40i16.38379Abstract
Entity hallucination poses a major challenge in radiology report generation (RRG), particularly for 3D CT scans where complex spatial contexts amplify factual errors. To address this, medical entity phrases serve as key carriers for multi-modal prompting, integrating expert knowledge into the vision-language model. Current methods use unified cross-attention for volume-phrase alignment, failing to account for anatomical specificity during the alignment process. In this work, we introduce the Dual-stream Entity Alignment Reporting network (DEAR) that separately models organ and lesion entities to resolve anatomical bias. Specifically, the dual-stream entity aligner is designed to partition medical entity phrases into organ and lesion streams, feeding them into separate cross-attention blocks in parallel to achieve fine-grained volume–phrase alignment. For structurally regular and spatially stable organ entities, an organ-guided cross-attention (OGCA) block is proposed to enforce structural consistency by retrieving the top-k voxel tokens via volume–phrase similarity and preserving spatial connectivity through morphological dilation. Meanwhile, a lesion-guided cross-attention (LGCA) block is introduced for structurally irregular and spatially variable lesion entities, enhancing anomaly sensitivity through phrase-weighted attention and refining discriminative boundaries via 3D residual Laplacian filtering. Experiments demonstrate that DEAR significantly reduces entity hallucinations and improves clinical factuality in 3D RRG benchmarks.Published
2026-03-14
How to Cite
Zhou, L., Yu, Y., Yi, Z., & Xu, X. (2026). Mitigating Entity Hallucinations in 3D Radiology Report Generation via Dual-Stream Alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13719–13727. https://doi.org/10.1609/aaai.v40i16.38379
Issue
Section
AAAI Technical Track on Computer Vision XIII