Jang J, Kong C, Jeon D, Kim S, Kwak N. Unifying Vision-Language Representation Space with Single-Tower Transformer. AAAI [Internet]. 2023 Jun. 26 [cited 2026 Jul. 20];37(1):980-8. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/25178