Xu, H., He, K., Plummer, B. A., Sigal, L., Sclaroff, S., & Saenko, K. (2019). Multilevel Language and Vision Integration for Text-to-Clip Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 9062-9069. https://doi.org/10.1609/aaai.v33i01.33019062