Lu, Z., Geng, T., Chen, Y., Wang, T., Lu, P., & Zheng, F. (2026). R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios. Proceedings of the AAAI Conference on Artificial Intelligence, 40(9), 7627–7635. https://doi.org/10.1609/aaai.v40i9.37704