[1]

R. Yan, “Video-Text Pre-training with Learned Regions for Retrieval”, AAAI, vol. 37, no. 3, pp. 3100–3108, Jun. 2023.