Distribution Adaptive INT8 Quantization for Training CNNs

Kang Zhao; Sida Huang; Pan Pan; Yinghan Li; Yingya Zhang; Zhenyu Gu; Yinghui Xu

doi:10.1609/aaai.v35i4.16462

Authors

Kang Zhao Alibaba
Sida Huang Alibaba
Pan Pan Alibaba
Yinghan Li Alibaba
Yingya Zhang Alibaba
Zhenyu Gu Alibaba
Yinghui Xu Alibaba

DOI:

https://doi.org/10.1609/aaai.v35i4.16462

Keywords:

Learning & Optimization for CV, Other Foundations of Computer Vision, Optimization, Applications

Abstract

Researches have demonstrated that low bit-width (e.g., INT8) quantization can be employed to accelerate the inference process. It makes the gradient quantization very promising since the backward propagation requires approximately twice more computation than forward one. Due to the variability and uncertainty of gradient distribution, a lot of methods have been proposed to attain training stability. However, most of them ignore the channel-wise gradient distributions and the impact of gradients with different magnitudes, resulting in the degradation of final accuracy. In this paper, we propose a novel INT8 quantization training framework for convolutional neural network to address the above issues. Specifically, we adopt Gradient Vectorized Quantization to quantize the gradient, based on the observation that layer-wise gradients contain multiple distributions along the channel dimension. Then, Magnitude-aware Clipping Strategy is introduced by taking the magnitudes of gradients into consideration when minimizing the quantization error, and we present a theoretical derivation to solve the quantization parameters of different distributions. Experimental results on broad range of computer vision tasks, such as image classification, object detection and video classification, demonstrate that the proposed Distribution Adaptive INT8 Quantization training method has achieved almost lossless training accuracy for different backbones, including ResNet, MobileNetV2, InceptionV3, VGG and AlexNet, which is superior to the state-of-the-art techniques. Moreover, we further implement the INT8 kernel that can accelerate the training iteration more than 200% under the latest Turing architecture, i.e., our method excels on both training accuracy and speed.

Distribution Adaptive INT8 Quantization for Training CNNs

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription