Ruan, Ludan, Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng, and Qin Jin. 2023. “Accommodating Audio Modality in CLIP for Multimodal Processing”. Proceedings of the AAAI Conference on Artificial Intelligence 37 (8):9641-49. https://doi.org/10.1609/aaai.v37i8.26153.