ProLoG: Hybrid Prompt and LoRA Based Adaptation of Vision-Language Models for OOD Generalization
DOI:
https://doi.org/10.1609/aaai.v40i29.39664Abstract
While vision-language foundation models (VLMs) achieve remarkable performance when fine-tuned on downstream in-distribution (ID) data, this process compromises their generalization ability on out-of-distribution (OOD) data that deviate from the downstream tasks due to overfitting. To address this, we propose ProLoG, a new adaptation method that effectively fine-tunes VLMs on downstream tasks while achieving high OOD performance. Specifically, we design a unique integration of prompt tuning and LoRA, offering a robust hybrid platform to improve performance. During training, we propose an augmentation-based regularization loss that enhances the generalization of our hybrid network by using augmented image features aligned with LLM-generated texts containing key attributes of each class. By leveraging our hybrid design, we also introduce an adaptive inference strategy that flexibly applies trained prompts and LoRA based on a task similarity score to effectively handle both ID and OOD data. Experimental results demonstrate that our proposed method outperforms existing works on various datasets, confirming its advantages.Downloads
Published
2026-03-14
How to Cite
Park, J., Han, D.-J., & Moon, J. (2026). ProLoG: Hybrid Prompt and LoRA Based Adaptation of Vision-Language Models for OOD Generalization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(29), 24782-24791. https://doi.org/10.1609/aaai.v40i29.39664
Issue
Section
AAAI Technical Track on Machine Learning VI