You, X., Huang, Q., Li, L., Zhang, C., Liu, X., Zhang, M., & Yu, J. (2026). Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(14), 12108–12116. https://doi.org/10.1609/aaai.v40i14.38200