Mind-the-Gap! Unsupervised Domain Adaptation for Text-Video Retrieval


  • Qingchao Chen Peking University University of Oxford
  • Yang Liu Peking University University of Oxford
  • Samuel Albanie University of Oxford




Language and Vision, Image and Video Retrieval, Transfer/Adaptation/Multi-task/Meta/Automated Learning


When can we expect a text-video retrieval system to work effectively on datasets that differ from its training domain? In this work, we investigate this question through the lens of unsupervised domain adaptation in which the objective is to match natural language queries and video content in the presence of domain shift at query-time. Such systems have significant practical applications since they are capable generalising to new data sources without requiring corresponding text annotations. We make the following contributions: (1) We propose the UDAVR (Unsupervised Domain Adaptation for Video Retrieval) benchmark and employ it to study the performance of text-video retrieval in the presence of domain shift. (2) We propose Concept-Aware-Pseudo-Query (CAPQ), a method for learning discriminative and transferable features that bridge these cross-domain discrepancies to enable effective target domain retrieval using source domain supervision. (3) We show that CAPQ outperforms alternative domain adaptation strategies on UDAVR.




How to Cite

Chen, Q., Liu, Y., & Albanie, S. (2021). Mind-the-Gap! Unsupervised Domain Adaptation for Text-Video Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1072-1080. https://doi.org/10.1609/aaai.v35i2.16192



AAAI Technical Track on Computer Vision I