End-to-End Knowledge Distillation for Unsupervised Domain Adaptation with Large Vision-language Models

Yangtao Wang; Xingwei Deng; Yanzhao Xie; Weilong Peng; Siyuan Chen; Xiaocui Li; Maobin Tang; Meie Fang

doi:10.1609/aaai.v40i31.39871

Authors

Yangtao Wang School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
Xingwei Deng School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
Yanzhao Xie School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
Weilong Peng School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
Siyuan Chen School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
Xiaocui Li Hunan University of Technology and Business, Changsha, China
Maobin Tang School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China
Meie Fang School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China

DOI:

https://doi.org/10.1609/aaai.v40i31.39871

Abstract

Knowledge distillation based on large vision-language models (VLMs) has recently emerged as a significant solution to transfer knowledge from the source domain to the target domain in unsupervised domain adaptation (UDA) tasks. However, existing methods employ a two-stage training pipeline, which not only complicates the training procedure but also lacks interactions between the source and target domains, severely hindering real-time cross-domain knowledge transfer. To address these challenges, we propose End-to-End Knowledge Distillation for UDA with large VLMs (termed as EKDA). (1) EKDA employs a lightweight prompt learning mechanism to first embed the knowledge from the source domain into VLMs, and then simultaneously utilize the image encoder and text encoder of VLMs to perform knowledge distillation on the target domain, significantly reducing the domain gap. (2) EKDA designs a teacher-student alternating training strategy to implement real-time collaborative interactions across domains, enabling an end-to-end paradigm to provide accurate source domain-aware supervision for the target domain. We conduct extensive experiments on 4 widely recognized benchmark datasets including Office-31, Office-Home, VisDA-2017, and Mini-DomainNet. Experimental results demonstrate that EKDA achieves significant performance improvement over the state-of-the-art UDA approaches, while maintaining a much lower model complexity. Take Office-Home for example, EKDA has gained at least 2.7% performance improvement while reducing the learnable parameters by over 80% compared with the state-of-the-art UDA baselines.

End-to-End Knowledge Distillation for Unsupervised Domain Adaptation with Large Vision-language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information