[1]

Salin, E., Farah, B., Ayache, S. and Favre, B. 2022. Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective. Proceedings of the AAAI Conference on Artificial Intelligence. 36, 10 (Jun. 2022), 11248-11257. DOI:https://doi.org/10.1609/aaai.v36i10.21375.