He, D., X. Zhao, J. Huang, F. Li, X. Liu, and S. Wen. “Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, July 2019, pp. 8393-00, doi:10.1609/aaai.v33i01.33018393.