Salin, E., Farah, B., Ayache, S., & Favre, B. (2022). Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 11248-11257. https://doi.org/10.1609/aaai.v36i10.21375