[1]
G. Li, N. Duan, Y. Fang, M. Gong, and D. Jiang, “Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training”, AAAI, vol. 34, no. 07, pp. 11336-11344, Apr. 2020.