[1]

Y. Wang, J. Xu, and Y. Sun, “End-to-End Transformer Based Model for Image Captioning”, AAAI, vol. 36, no. 3, pp. 2585-2594, Jun. 2022.