LI, Zhangbin; ZHOU, Jinxing; ZHANG, Jing; TANG, Shengeng; LI, Kun; GUO, Dan. Patch-level Sounding Object Tracking for Audio-Visual Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 39, n. 5, p. 5075–5083, 2025. DOI: 10.1609/aaai.v39i5.32538. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/32538. Acesso em: 31 may. 2026.