1.
Xu R, Xiong C, Chen W, Corso J. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework. AAAI [Internet]. 2015 Feb. 19 [cited 2026 May 9];29(1). Available from: https://ojs.aaai.org/index.php/AAAI/article/view/9512