Tang, Yunlong, Daiki Shimada, Jing Bi, Mingqian Feng, Hang Hua, and Chenliang Xu. “Empowering LLMs With Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding”. Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 7 (April 11, 2025): 7293–7301. Accessed July 15, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/32784.