End-to-End Knowledge Distillation for Unsupervised Domain Adaptation with Large Vision-language Models

Authors

  • Yangtao Wang School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
  • Xingwei Deng School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
  • Yanzhao Xie School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
  • Weilong Peng School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
  • Siyuan Chen School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
  • Xiaocui Li Hunan University of Technology and Business, Changsha, China
  • Maobin Tang School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
  • Meie Fang School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China

DOI:

https://doi.org/10.1609/aaai.v40i31.39871

Abstract

Knowledge distillation based on large vision-language models (VLMs) has recently emerged as a significant solution to transfer knowledge from the source domain to the target domain in unsupervised domain adaptation (UDA) tasks. However, existing methods employ a two-stage training pipeline, which not only complicates the training procedure but also lacks interactions between the source and target domains, severely hindering real-time cross-domain knowledge transfer. To address these challenges, we propose End-to-End Knowledge Distillation for UDA with large VLMs (termed as EKDA). (1) EKDA employs a lightweight prompt learning mechanism to first embed the knowledge from the source domain into VLMs, and then simultaneously utilize the image encoder and text encoder of VLMs to perform knowledge distillation on the target domain, significantly reducing the domain gap. (2) EKDA designs a teacher-student alternating training strategy to implement real-time collaborative interactions across domains, enabling an end-to-end paradigm to provide accurate source domain-aware supervision for the target domain. We conduct extensive experiments on 4 widely recognized benchmark datasets including Office-31, Office-Home, VisDA-2017, and Mini-DomainNet. Experimental results demonstrate that EKDA achieves significant performance improvement over the state-of-the-art UDA approaches, while maintaining a much lower model complexity. Take Office-Home for example, EKDA has gained at least 2.7% performance improvement while reducing the learnable parameters by over 80% compared with the state-of-the-art UDA baselines.

Downloads

Published

2026-03-14

How to Cite

Wang, Y., Deng, X., Xie, Y., Peng, W., Chen, S., Li, X., Tang, M., & Fang, M. (2026). End-to-End Knowledge Distillation for Unsupervised Domain Adaptation with Large Vision-language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(31), 26624-26633. https://doi.org/10.1609/aaai.v40i31.39871

Issue

Section

AAAI Technical Track on Machine Learning VIII