Chen, Y., Wang, J., Lin, L., Qi, Z., Ma, J., & Shan, Y. (2023). Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 396-404. https://doi.org/10.1609/aaai.v37i1.25113