Generalizing Vision-Language Models with Dedicated Prompt Guidance

Authors

  • Xinyao Li School of Computer Science and Engineering, University of Electronic Science and Technology of China
  • Yinjie Min School of Statistics and Data Science, Nankai University
  • Hongbo Chen School of Computer Science and Engineering, University of Electronic Science and Technology of China
  • Zhekai Du School of Computer Science and Engineering, University of Electronic Science and Technology of China
  • Fengling Li University of Technology Sydney
  • Jingjing Li School of Computer Science and Engineering, University of Electronic Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i28.39492

Abstract

Fine-tuning large pretrained vision-language models (VLMs) has emerged as a prevalent paradigm for downstream adaptation, yet it faces a critical trade-off between domain specificity and domain generalization (DG) ability. Current methods typically fine-tune a universal model on the entire dataset, which potentially compromises the ability to generalize to unseen domains. To fill this gap, we provide a theoretical understanding of the generalization ability for VLM fine-tuning, which reveals that training multiple parameter-efficient expert models on partitioned source domains leads to better generalization than fine-tuning a universal model. Inspired by this finding, we propose a two-step domain-expert-Guided DG (GuiDG) framework. GuiDG first employs prompt tuning to obtain source domain experts, then introduces a Cross-Modal Attention module to guide the fine-tuning of the vision encoder via adaptive expert integration. To better evaluate few-shot DG, we construct ImageNet-DG from ImageNet and its variants. Extensive experiments on standard DG benchmarks and ImageNet-DG demonstrate that GuiDG improves upon state-of-the-art fine-tuning methods while maintaining efficiency.

Downloads

Published

2026-03-14

How to Cite

Li, X., Min, Y., Chen, H., Du, Z., Li, F., & Li, J. (2026). Generalizing Vision-Language Models with Dedicated Prompt Guidance. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23239–23247. https://doi.org/10.1609/aaai.v40i28.39492

Issue

Section

AAAI Technical Track on Machine Learning V