[1]

X. You, “Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning”, AAAI, vol. 40, no. 14, pp. 12108–12116, Mar. 2026.