[1]
Y. Tang, D. Shimada, J. Bi, M. Feng, H. Hua, and C. Xu, “Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding”, AAAI, vol. 39, no. 7, pp. 7293–7301, Apr. 2025.