YU, F.; TANG, J.; YIN, W.; SUN, Y.; TIAN, H.; WU, H.; WANG, H. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 35, n. 4, p. 3208-3216, 2021. DOI: 10.1609/aaai.v35i4.16431. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/16431. Acesso em: 29 mar. 2024.