1.
Huang Y, Tang J, Chen Z, Zhang R, Zhang X, Chen W, et al. Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations. AAAI [Internet]. 2024 Mar. 24 [cited 2026 May 25];38(3):2417-25. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/28017