[1]

Z. Wang, “SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning”, AAAI, vol. 36, no. 5, pp. 5914–5922, Jun. 2022.