Bi-ViT: Pushing the Limit of Vision Transformer Quantization

Yanjing Li; Sheng Xu; Mingbao Lin; Xianbin Cao; Chuanjian Liu; Xiao Sun; Baochang Zhang

doi:10.1609/aaai.v38i4.28109

Authors

Yanjing Li Beihang University
Sheng Xu Beihang University
Mingbao Lin Tencent
Xianbin Cao Beihang University, China
Chuanjian Liu Huawei Noah's Ark Lab
Xiao Sun Shanghai Artificial Intelligence Laboratory
Baochang Zhang Zhongguancun Laboratory Hangzhou Research Institute, Beihang University Nanchang Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v38i4.28109

Keywords:

CV: Object Detection & Categorization

Abstract

Vision transformers (ViTs) quantization offers a promising prospect to facilitate deploying large pre-trained networks on resource-limited devices. Fully-binarized ViTs (Bi-ViT) that pushes the quantization of ViTs to its limit remain largely unexplored and a very challenging task yet, due to their unacceptable performance. Through extensive empirical analyses, we identify the severe drop in ViT binarization is caused by attention distortion in self-attention, which technically stems from the gradient vanishing and ranking disorder. To address these issues, we first introduce a learnable scaling factor to reactivate the vanished gradients and illustrate its effectiveness through theoretical and experimental analyses. We then propose a ranking-aware distillation method to rectify the disordered ranking in a teacher-student framework. Bi-ViT achieves significant improvements over popular DeiT and Swin backbones in terms of Top-1 accuracy and FLOPs. For example, with DeiT-Tiny and Swin-Tiny, our method significantly outperforms baselines by 22.1% and 21.4% respectively, while 61.5x and 56.1x theoretical acceleration in terms of FLOPs compared with real-valued counterparts on ImageNet. Our codes and models are attached on https://github.com/YanjingLi0202/Bi-ViT/ .

Bi-ViT: Pushing the Limit of Vision Transformer Quantization

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription