[1]
Yu, F., Tang, J., Yin, W., Sun, Y., Tian, H., Wu, H. and Wang, H. 2021. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs. Proceedings of the AAAI Conference on Artificial Intelligence. 35, 4 (May 2021), 3208-3216. DOI:https://doi.org/10.1609/aaai.v35i4.16431.