[1]
Y. Chen, J. Wang, L. Lin, Z. Qi, J. Ma, and Y. Shan, “Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval”, AAAI, vol. 37, no. 1, pp. 396–404, Jun. 2023.