TIM++: Transductive Information Maximization for Few-Shot CLIP
DOI:
https://doi.org/10.1609/aaai.v40i8.37598Abstract
Transductive Information Maximization (TIM) is a leading transductive few-shot learning method that maximizes the mutual information between query features and their predicted labels, while incorporating supervision from the support set. However, its potential remains underexplored, primarily due to the limited utilization of textual knowledge provided by vision-language models (VLMs) such as CLIP. To address this, we propose TIM++, an enhanced framework that incorporates both visual and textual information for few-shot CLIP adaptation. Specifically, TIM++ introduces a Kullback-Leibler (KL) divergence-based regularization term that encourages the model’s posterior predictions to align with CLIP’s zero-shot output distribution, especially focusing on the most confident predictions. Additionally, we develop an improved prototype initialization strategy that leverages both support and query features enriched with CLIP-guided semantics. Extensive experiments on 11 public datasets demonstrate that TIM++ consistently outperforms the standard TIM, achieving average accuracy gains of 19.25% and 10.88% in 1-shot and 2-shot settings, respectively. TIM++ also surpasses other existing state-of-the-art methods, establishing a new benchmark for few-shot learning with VLMs.Published
2026-03-14
How to Cite
Li, Y., Zou, Y., Huang, Y., Jiao, C., Wang, X., Peng, S., Guo, Z., & Gou, S. (2026). TIM++: Transductive Information Maximization for Few-Shot CLIP. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 6671-6680. https://doi.org/10.1609/aaai.v40i8.37598
Issue
Section
AAAI Technical Track on Computer Vision V