Width & Depth Pruning for Vision Transformers

Fang Yu; Kun Huang; Meng Wang; Yuan Cheng; Wei Chu; Li Cui

doi:10.1609/aaai.v36i3.20222

Authors

Fang Yu Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Science
Kun Huang Ant Financial Services Group
Meng Wang Ant Financial Services Group
Yuan Cheng Ant Financial Services Group
Wei Chu Ant Financial Services Group
Li Cui Institute of Computing Technology, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v36i3.20222

Keywords:

Computer Vision (CV)

Abstract

Transformer models have demonstrated their promising potential and achieved excellent performance on a series of computer vision tasks. However, the huge computational cost of vision transformers hinders their deployment and application to edge devices. Recent works have proposed to ﬁnd and remove the unimportant units of vision transformers. Despite achieving remarkable results, these methods take one dimension of network width into consideration and ignore network depth, which is another important dimension for pruning vision transformers. Therefore, we propose a Width & Depth Pruning (WDPruning) framework that reduces both width and depth dimensions simultaneously. Speciﬁcally, for width pruning, a set of learnable pruning-related parameters is used to adaptively adjust the width of transformer. For depth pruning, we introduce several shallow classiﬁers by using the intermediate information of the transformer blocks, which allows images to be classiﬁed by shallow classiﬁers instead of the deeper classiﬁers. In the inference period, all of the blocks after shallow classiﬁers can be dropped so they don’t bring additional parameters and computation. Experimental results on benchmark datasets demonstrate that the proposed method can signiﬁcantly reduce the computational costs of mainstream vision transformers such as DeiT and Swin Transformer with a minor accuracy drop. In particular, on ILSVRC-12, we achieve over 22% pruning ratio of FLOPs by compressing DeiT-Base, even with an increase of 0.14% Top-1 accuracy.

Width & Depth Pruning for Vision Transformers

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription