[1]

Wang, Z. et al. 2022. SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence. 36, 5 (Jun. 2022), 5914–5922. DOI:https://doi.org/10.1609/aaai.v36i5.20536.