Xuan, Hanyu, Zhenyu Zhang, Shuo Chen, Jian Yang, and Yan Yan. “Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization”. Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 279-286. Accessed September 22, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/5361.