[1]

L. Ventura, A. Yang, C. Schmid, and G. Varol, “CoVR: Learning Composed Video Retrieval from Web Video Captions”, AAAI, vol. 38, no. 6, pp. 5270-5279, Mar. 2024.