Li, Zhangbin, Jinxing Zhou, Jing Zhang, Shengeng Tang, Kun Li, and Dan Guo. 2025. “Patch-Level Sounding Object Tracking for Audio-Visual Question Answering”. Proceedings of the AAAI Conference on Artificial Intelligence 39 (5):5075-83. https://doi.org/10.1609/aaai.v39i5.32538.