[1]

K. Tian, Y. Cheng, Y. Liu, X. Hou, Q. Chen, and H. Li, “Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning”, AAAI, vol. 38, no. 6, pp. 5207-5214, Mar. 2024.