LU, Zhu; GENG, Tiantian; CHEN, Yangye; WANG, Teng; LU, Ping; ZHENG, Feng. R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 40, n. 9, p. 7627–7635, 2026. DOI: 10.1609/aaai.v40i9.37704. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/37704. Acesso em: 13 may. 2026.