CHEN, Yizhen; WANG, Jie; LIN, Lijian; QI, Zhongang; MA, Jin; SHAN, Ying. Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 37, n. 1, p. 396–404, 2023. DOI: 10.1609/aaai.v37i1.25113. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/25113. Acesso em: 25 may. 2026.