[1]
L. Zhou, H. Palangi, L. Zhang, H. Hu, J. Corso, and J. Gao, “Unified Vision-Language Pre-Training for Image Captioning and VQA”, AAAI, vol. 34, no. 07, pp. 13041-13049, Apr. 2020.