He, D., Zhao, X., Huang, J., Li, F., Liu, X., & Wen, S. (2019). Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 8393-8400. https://doi.org/10.1609/aaai.v33i01.33018393