Xu, R., Xiong, C., Chen, W. and Corso, J. (2015) “Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework”, Proceedings of the AAAI Conference on Artificial Intelligence, 29(1). doi: 10.1609/aaai.v29i1.9512.