Zhao, S., Y. Ma, Y. Gu, J. Yang, T. Xing, P. Xu, R. Hu, H. Chai, and K. Keutzer. “An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, Apr. 2020, pp. 303-11, doi:10.1609/aaai.v34i01.5364.