Zhu, W., Wang, Y., Li, H., & Zhu, P. (2026). VTD-CLIP: Video-to-Text Discretization via Prompting CLIP. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13979–13987. https://doi.org/10.1609/aaai.v40i16.38408