WST: Wavelet-Based Multi-scale Tuning for Visual Transfer Learning

Jia Zeng; Lan Huang; Kangping Wang

doi:10.1609/aaai.v39i21.34387

Authors

Jia Zeng College of Computer Science and Technology, Jilin University
Lan Huang College of Computer Science and Technology, Jilin University Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University
Kangping Wang College of Computer Science and Technology, Jilin University Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University

DOI:

https://doi.org/10.1609/aaai.v39i21.34387

Abstract

Large-scale pre-trained Vision Transformer (ViT) models have demonstrated remarkable performance on visual tasks but are computationally expensive to transfer to downstream tasks. Parameter-Efficient Fine-Tuning (PEFT) offers a promising transferring approach by updating only a subset of parameters. However, PEFT's effectiveness is hindered by discrepancies between pre-training and downstream tasks in terms of object scale and granularity. Downstream tasks often focus on finer-grained and more specialized recognition, requiring more detailed features. The diversity of feature scales of existing PEFT methods for ViT is limited. To address this, we propose a novel PEFT method named Wavelet-based multi-Scale Tuning (WST), which learns multi-scale features in a simple and efficient way. WST introduces a parallel fine-tuning patch embedding branch with a smaller patch size than the pre-trained model to capture finer-grained features. Furthermore, to handle the computational challenge from the resulting longer token sequence, WST designs wavelet fine-tuning blocks that balance both efficiency and performance. In the block, wavelet transform enables invertible and lossless down-sampling of the longer token sequence, aligning it with that of the backbone, and two lightweight linear mappings are employed to learn task-specific features. This design facilitates efficient multi-scale information exchange between the pre-trained backbone and fine-tuning branch. Extensive experiments on transfer learning demonstrate the promising performance and efficiency of our WST.

WST: Wavelet-Based Multi-scale Tuning for Visual Transfer Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information