Salin, E., B. Farah, S. Ayache, and B. Favre. “Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, June 2022, pp. 11248-57, doi:10.1609/aaai.v36i10.21375.