[1]
Z. Zhao, “Temporal Calibrating and Distilling for Scene-Text Aware Text-Video Retrieval”, AAAI, vol. 40, no. 16, pp. 13323–13331, Mar. 2026.