He, D., Zhao, X., Huang, J., Li, F., Liu, X. and Wen, S. (2019) “Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos”, Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), pp. 8393-8400. doi: 10.1609/aaai.v33i01.33018393.