1.
Yu F, Tang J, Yin W, Sun Y, Tian H, Wu H, Wang H. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs. AAAI [Internet]. 2021May18 [cited 2024Apr.24];35(4):3208-16. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/16431