Yan, R., Shou, M. Z., Ge, Y., Wang, J., Lin, X., Cai, G., & Tang, J. (2023). Video-Text Pre-training with Learned Regions for Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 3100–3108. https://doi.org/10.1609/aaai.v37i3.25414