Fine-Grained Retrieval Prompt Tuning

Shijie Wang; Jianlong Chang; Zhihui Wang; Haojie Li; Wanli Ouyang; Qi Tian

doi:10.1609/aaai.v37i2.25363

Authors

Shijie Wang International School of Information Science & Engineering, Dalian University of Technology, China
Jianlong Chang Huawei Cloud & AI, China
Zhihui Wang International School of Information Science & Engineering, Dalian University of Technology, China
Haojie Li International School of Information Science & Engineering, Dalian University of Technology, China College of Computer and Engineering, Shandong University of Science and Technology, China
Wanli Ouyang SenseTime Computer Vision Research Group, The University of Sydney, Australia
Qi Tian Huawei Cloud & AI, China

DOI:

https://doi.org/10.1609/aaai.v37i2.25363

Keywords:

CV: Image and Video Retrieval

Abstract

Fine-grained object retrieval aims to learn discriminative representation to retrieve visually similar objects. However, existing top-performing works usually impose pairwise similarities on the semantic embedding spaces or design a localization sub-network to continually fine-tune the entire model in limited data scenarios, thus resulting in convergence to suboptimal solutions. In this paper, we develop Fine-grained Retrieval Prompt Tuning (FRPT), which steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompting and feature adaptation. Specifically, FRPT only needs to learn fewer parameters in the prompt and adaptation instead of fine-tuning the entire model, thus solving the issue of convergence to suboptimal solutions caused by fine-tuning the entire model. Technically, a discriminative perturbation prompt (DPP) is introduced and deemed as a sample prompting process, which amplifies and even exaggerates some discriminative elements contributing to category prediction via a content-aware inhomogeneous sampling operation. In this way, DPP can make the fine-grained retrieval task aided by the perturbation prompts close to the solved task during the original pre-training. Thereby, it preserves the generalization and discrimination of representation extracted from input samples. Besides, a category-specific awareness head is proposed and regarded as feature adaptation, which removes the species discrepancies in features extracted by the pre-trained model using category-guided instance normalization. And thus, it makes the optimized features only include the discrepancies among subcategories. Extensive experiments demonstrate that our FRPT with fewer learnable parameters achieves the state-of-the-art performance on three widely-used fine-grained datasets.

Fine-Grained Retrieval Prompt Tuning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription