Zhou, Ziheng, Jinxing Zhou, Wei Qian, Shengeng Tang, Xiaojun Chang, and Dan Guo. “Dense Audio-Visual Event Localization Under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration”. Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 10 (April 11, 2025): 10905–10913. Accessed May 31, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/33185.