Chen, Yizhen, Jie Wang, Lijian Lin, Zhongang Qi, Jin Ma, and Ying Shan. “Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval”. Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 396–404. Accessed May 25, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/25113.