Tian, K., Y. Cheng, Y. Liu, X. Hou, Q. Chen, and H. Li. “Towards Efficient and Effective Text-to-Video Retrieval With Coarse-to-Fine Visual Representation Learning”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, Mar. 2024, pp. 5207-14, doi:10.1609/aaai.v38i6.28327.