Jang, Jiho, Chaerin Kong, DongHyeon Jeon, Seonhoon Kim, and Nojun Kwak. 2023. “Unifying Vision-Language Representation Space With Single-Tower Transformer”. Proceedings of the AAAI Conference on Artificial Intelligence 37 (1):980-88. https://doi.org/10.1609/aaai.v37i1.25178.