Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

Authors

  • Wenshuo Peng Shanghai AI Laboratory
  • Kaipeng Zhang Shanghai AI Laboratory
  • Yue Yang Shanghai AI Laboratory Shanghai Jiao Tong University
  • Hao Zhang Shanghai AI Laboratory Xi'an Jiaotong University
  • Yu Qiao Shanghai AI Laboraotry

DOI:

https://doi.org/10.1609/aaai.v38i5.28249

Keywords:

CV: Language and Vision, CV: Large Vision Models, ML: Transfer, Domain Adaptation, Multi-Task Learning, ML: Unsupervised & Self-Supervised Learning

Abstract

Vision-language foundation models have been incredibly successful in a wide range of downstream computer vision tasks using adaptation methods. However, due to the high cost of obtaining pre-training datasets, pairs with weak image-text correlation in the data exist in large numbers. We call them weak-paired samples. Due to the limitations of these weak-paired samples, the pre-training model are unable to mine all the knowledge from pre-training data. The existing adaptation methods do not consider the missing knowledge, which may lead to crucial task-related knowledge for the downstream tasks being ignored. To address this issue, we propose a new adaptation framework called Data Adaptive Traceback (DAT). Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data to enable the downstream tasks. Furthermore, we adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning. We conduct extensive experiments that show our proposed DAT approach meaningfully improves various benchmark datasets’ performance over traditional adaptation methods by simply.

Published

2024-03-24

How to Cite

Peng, W., Zhang, K., Yang, Y., Zhang, H., & Qiao, Y. (2024). Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4506-4514. https://doi.org/10.1609/aaai.v38i5.28249

Issue

Section

AAAI Technical Track on Computer Vision IV