Ruan, L. (2023) “Accommodating Audio Modality in CLIP for Multimodal Processing”, Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), pp. 9641–9649. doi: 10.1609/aaai.v37i8.26153.