Hao, W., Zhang, Z., & Guan, H. (2018). Integrating Both Visual and Audio Cues for Enhanced Video Caption. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.12330