Xuan, H., Zhang, Z., Chen, S., Yang, J., & Yan, Y. (2020). Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01), 279-286. https://doi.org/10.1609/aaai.v34i01.5361