Xuan, H., Z. Zhang, S. Chen, J. Yang, and Y. Yan. “Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, Apr. 2020, pp. 279-86, doi:10.1609/aaai.v34i01.5361.