[1]
W. Xue, “ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries”, AAAI, vol. 39, no. 9, pp. 9050–9058, Apr. 2025.