Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP

Yayuan Li; Jintao Guo; Lei Qi; Wenbin Li; Yinghuan Shi

doi:10.1609/aaai.v39i5.32534

Authors

Yayuan Li National Key Laboratory for Novel Software Technology, Nanjing University, China
Jintao Guo National Key Laboratory for Novel Software Technology, Nanjing University, China
Lei Qi School of Computer Science and Engineering, Southeast University, China
Wenbin Li National Key Laboratory for Novel Software Technology, Nanjing University, China
Yinghuan Shi National Key Laboratory for Novel Software Technology, Nanjing University, China

DOI:

https://doi.org/10.1609/aaai.v39i5.32534

Abstract

Contrastive Language-Image Pretraining (CLIP) has been widely used in vision tasks. Notably, CLIP has demonstrated promising performance in few-shot learning (FSL). However, existing CLIP-based methods in training-free FSL (i.e., without the requirement of additional training) mainly learn different modalities independently, leading to two essential issues: 1) severe anomalous match in image modality; 2) varying quality of generated text prompts. To address these issues, we build a mutual guidance mechanism, that introduces an Image-Guided-Text (IGT) component to rectify varying quality of text prompts through image representations, and a Text-Guided-Image (TGI) component to mitigate the anomalous match of image modality through text representations. By integrating IGT and TGI, we adopt a perspective of Text-Image Mutual guidance Optimization, proposing TIMO. Extensive experiments show that TIMO significantly outperforms the state-of-the-art (SOTA) training-free method. Additionally, by exploring the extent of mutual guidance, we propose an enhanced variant, TIMO-S, which even surpasses the best training-required methods by 0.33% with approximately ×100 less time cost.

Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information