Hao, W., Zhang, Z., & Guan, H. (2018). Integrating Both Visual and Audio Cues for Enhanced Video Caption. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/12330