1.
Li G, Duan N, Fang Y, Gong M, Jiang D. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training. AAAI [Internet]. 2020Apr.3 [cited 2024Mar.28];34(07):11336-44. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/6795