Yu, F., Tang, J., Yin, W., Sun, Y., Tian, H., Wu, H. and Wang, H. (2021) “ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs”, Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), pp. 3208-3216. doi: 10.1609/aaai.v35i4.16431.