Sarkar, P., & Etemad, A. (2023). Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9723-9732. https://doi.org/10.1609/aaai.v37i8.26162