Tian, K., Cheng, Y., Liu, Y., Hou, X., Chen, Q., & Li, H. (2024). Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5207-5214. https://doi.org/10.1609/aaai.v38i6.28327