Tu, Yunbin, Liang Li, Li Su, and Qingming Huang. “Query-Centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning”. Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 7 (April 11, 2025): 7464–7472. Accessed May 13, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/32803.