Wang, Yiyu, Jungang Xu, and Yingfei Sun. “End-to-End Transformer Based Model for Image Captioning”. Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2585-2594. Accessed May 2, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/20160.