[1]
Li, Z. et al. 2025. Patch-level Sounding Object Tracking for Audio-Visual Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence. 39, 5 (Apr. 2025), 5075–5083. DOI:https://doi.org/10.1609/aaai.v39i5.32538.