Fang, X., Liu, D., Fang, W., Zhou, P., Xu, Z., Xu, W., … Li, R. (2024). Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1735–1743. https://doi.org/10.1609/aaai.v38i2.27941