(1)

Sarkar, P.; Etemad, A. Self-Supervised Audio-Visual Representation Learning With Relaxed Cross-Modal Synchronicity. AAAI 2023, 37, 9723-9732.