Ruan, Ludan, et al. “Accommodating Audio Modality in CLIP for Multimodal Processing”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 8, June 2023, pp. 9641-9, doi:10.1609/aaai.v37i8.26153.