Xuan, H., Zhang, Z., Chen, S., Yang, J. and Yan, Y. (2020) “Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization”, Proceedings of the AAAI Conference on Artificial Intelligence, 34(01), pp. 279-286. doi: 10.1609/aaai.v34i01.5361.