WST: Wavelet-Based Multi-scale Tuning for Visual Transfer Learning

Authors

  • Jia Zeng College of Computer Science and Technology, Jilin University
  • Lan Huang College of Computer Science and Technology, Jilin University Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University
  • Kangping Wang College of Computer Science and Technology, Jilin University Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University

DOI:

https://doi.org/10.1609/aaai.v39i21.34387

Abstract

Large-scale pre-trained Vision Transformer (ViT) models have demonstrated remarkable performance on visual tasks but are computationally expensive to transfer to downstream tasks. Parameter-Efficient Fine-Tuning (PEFT) offers a promising transferring approach by updating only a subset of parameters. However, PEFT's effectiveness is hindered by discrepancies between pre-training and downstream tasks in terms of object scale and granularity. Downstream tasks often focus on finer-grained and more specialized recognition, requiring more detailed features. The diversity of feature scales of existing PEFT methods for ViT is limited. To address this, we propose a novel PEFT method named Wavelet-based multi-Scale Tuning (WST), which learns multi-scale features in a simple and efficient way. WST introduces a parallel fine-tuning patch embedding branch with a smaller patch size than the pre-trained model to capture finer-grained features. Furthermore, to handle the computational challenge from the resulting longer token sequence, WST designs wavelet fine-tuning blocks that balance both efficiency and performance. In the block, wavelet transform enables invertible and lossless down-sampling of the longer token sequence, aligning it with that of the backbone, and two lightweight linear mappings are employed to learn task-specific features. This design facilitates efficient multi-scale information exchange between the pre-trained backbone and fine-tuning branch. Extensive experiments on transfer learning demonstrate the promising performance and efficiency of our WST.

Published

2025-04-11

How to Cite

Zeng, J., Huang, L., & Wang, K. (2025). WST: Wavelet-Based Multi-scale Tuning for Visual Transfer Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 39(21), 22317–22325. https://doi.org/10.1609/aaai.v39i21.34387

Issue

Section

AAAI Technical Track on Machine Learning VII