Jang, Jiho, Chaerin Kong, DongHyeon Jeon, Seonhoon Kim, and Nojun Kwak. “Unifying Vision-Language Representation Space With Single-Tower Transformer”. Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 980-988. Accessed September 13, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/25178.