[1]
Z. Li, J. Zhou, J. Zhang, S. Tang, K. Li, and D. Guo, “Patch-level Sounding Object Tracking for Audio-Visual Question Answering”, AAAI, vol. 39, no. 5, pp. 5075–5083, Apr. 2025.