Hao, Wangli, Zhaoxiang Zhang, and He Guan. 2018. “Integrating Both Visual and Audio Cues for Enhanced Video Caption”. Proceedings of the AAAI Conference on Artificial Intelligence 32 (1). https://doi.org/10.1609/aaai.v32i1.12330.