[1]

Jang, J. et al. 2023. Unifying Vision-Language Representation Space with Single-Tower Transformer. Proceedings of the AAAI Conference on Artificial Intelligence. 37, 1 (Jun. 2023), 980–988. DOI:https://doi.org/10.1609/aaai.v37i1.25178.