Yu, F. (2021) “ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs”, Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), pp. 3208–3216. doi: 10.1609/aaai.v35i4.16431.