[1]
Xu, R., Xiong, C., Chen, W. and Corso, J. 2015. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework. Proceedings of the AAAI Conference on Artificial Intelligence. 29, 1 (Feb. 2015). DOI:https://doi.org/10.1609/aaai.v29i1.9512.