Chen, J., Pan, Y., Li, Y., Yao, T., Chao, H., & Mei, T. (2019). Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 8167-8174. https://doi.org/10.1609/aaai.v33i01.33018167