Width & Depth Pruning for Vision Transformers

Authors

  • Fang Yu Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Science
  • Kun Huang Ant Financial Services Group
  • Meng Wang Ant Financial Services Group
  • Yuan Cheng Ant Financial Services Group
  • Wei Chu Ant Financial Services Group
  • Li Cui Institute of Computing Technology, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v36i3.20222

Keywords:

Computer Vision (CV)

Abstract

Transformer models have demonstrated their promising potential and achieved excellent performance on a series of computer vision tasks. However, the huge computational cost of vision transformers hinders their deployment and application to edge devices. Recent works have proposed to find and remove the unimportant units of vision transformers. Despite achieving remarkable results, these methods take one dimension of network width into consideration and ignore network depth, which is another important dimension for pruning vision transformers. Therefore, we propose a Width & Depth Pruning (WDPruning) framework that reduces both width and depth dimensions simultaneously. Specifically, for width pruning, a set of learnable pruning-related parameters is used to adaptively adjust the width of transformer. For depth pruning, we introduce several shallow classifiers by using the intermediate information of the transformer blocks, which allows images to be classified by shallow classifiers instead of the deeper classifiers. In the inference period, all of the blocks after shallow classifiers can be dropped so they don’t bring additional parameters and computation. Experimental results on benchmark datasets demonstrate that the proposed method can significantly reduce the computational costs of mainstream vision transformers such as DeiT and Swin Transformer with a minor accuracy drop. In particular, on ILSVRC-12, we achieve over 22% pruning ratio of FLOPs by compressing DeiT-Base, even with an increase of 0.14% Top-1 accuracy.

Downloads

Published

2022-06-28

How to Cite

Yu, F., Huang, K., Wang, M., Cheng, Y., Chu, W., & Cui, L. (2022). Width & Depth Pruning for Vision Transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 3143-3151. https://doi.org/10.1609/aaai.v36i3.20222

Issue

Section

AAAI Technical Track on Computer Vision III