Jang, J., C. Kong, D. Jeon, S. Kim, and N. Kwak. “Unifying Vision-Language Representation Space With Single-Tower Transformer”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, June 2023, pp. 980-8, doi:10.1609/aaai.v37i1.25178.