[1]
E. Salin, B. Farah, S. Ayache, and B. Favre, “Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective”, AAAI, vol. 36, no. 10, pp. 11248-11257, Jun. 2022.