[1]
L. Ruan, A. Hu, Y. Song, L. Zhang, S. Zheng, and Q. Jin, “Accommodating Audio Modality in CLIP for Multimodal Processing”, AAAI, vol. 37, no. 8, pp. 9641–9649, Jun. 2023.