Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning

Authors

  • Kun Ding State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation, Chinese Academy of Sciences
  • Haojian Zhang Engineering Laboratory for Intelligent Industrial Vision Institute of Automation, Chinese Academy of Sciences
  • Qiang Yu Research Center of Aerospace Information Institute of Automation, Chinese Academy of Sciences
  • Ying Wang State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation, Chinese Academy of Sciences
  • Shiming Xiang State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation, Chinese Academy of Sciences
  • Chunhong Pan Research Center of Aerospace Information Institute of Automation, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v38i2.27918

Keywords:

CV: Language and Vision, CV: Multi-modal Vision, ML: Transfer, Domain Adaptation, Multi-Task Learning

Abstract

We propose a generalized method for boosting the generalization ability of pre-trained vision-language models (VLMs) while fine-tuning on downstream few-shot tasks. The idea is realized by exploiting out-of-distribution (OOD) detection to predict whether a sample belongs to a base distribution or a novel distribution and then using the score generated by a dedicated competition based scoring function to fuse the zero-shot and few-shot classifier. The fused classifier is dynamic, which will bias towards the zero-shot classifier if a sample is more likely from the distribution pre-trained on, leading to improved base-to-novel generalization ability. Our method is performed only in test stage, which is applicable to boost existing methods without time-consuming re-training. Extensive experiments show that even weak distribution detectors can still improve VLMs' generalization ability. Specifically, with the help of OOD detectors, the harmonic mean of CoOp and ProGrad increase by 2.6 and 1.5 percentage points over 11 recognition datasets in the base-to-novel setting.

Published

2024-03-24

How to Cite

Ding, K., Zhang, H., Yu, Q., Wang, Y., Xiang, S., & Pan, C. (2024). Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1528-1536. https://doi.org/10.1609/aaai.v38i2.27918

Issue

Section

AAAI Technical Track on Computer Vision I