Distribution Adaptive INT8 Quantization for Training CNNs

Authors

  • Kang Zhao Alibaba
  • Sida Huang Alibaba
  • Pan Pan Alibaba
  • Yinghan Li Alibaba
  • Yingya Zhang Alibaba
  • Zhenyu Gu Alibaba
  • Yinghui Xu Alibaba

DOI:

https://doi.org/10.1609/aaai.v35i4.16462

Keywords:

Learning & Optimization for CV, Other Foundations of Computer Vision, Optimization, Applications

Abstract

Researches have demonstrated that low bit-width (e.g., INT8) quantization can be employed to accelerate the inference process. It makes the gradient quantization very promising since the backward propagation requires approximately twice more computation than forward one. Due to the variability and uncertainty of gradient distribution, a lot of methods have been proposed to attain training stability. However, most of them ignore the channel-wise gradient distributions and the impact of gradients with different magnitudes, resulting in the degradation of final accuracy. In this paper, we propose a novel INT8 quantization training framework for convolutional neural network to address the above issues. Specifically, we adopt Gradient Vectorized Quantization to quantize the gradient, based on the observation that layer-wise gradients contain multiple distributions along the channel dimension. Then, Magnitude-aware Clipping Strategy is introduced by taking the magnitudes of gradients into consideration when minimizing the quantization error, and we present a theoretical derivation to solve the quantization parameters of different distributions. Experimental results on broad range of computer vision tasks, such as image classification, object detection and video classification, demonstrate that the proposed Distribution Adaptive INT8 Quantization training method has achieved almost lossless training accuracy for different backbones, including ResNet, MobileNetV2, InceptionV3, VGG and AlexNet, which is superior to the state-of-the-art techniques. Moreover, we further implement the INT8 kernel that can accelerate the training iteration more than 200% under the latest Turing architecture, i.e., our method excels on both training accuracy and speed.

Downloads

Published

2021-05-18

How to Cite

Zhao, K., Huang, S., Pan, P., Li, Y., Zhang, Y., Gu, Z., & Xu, Y. (2021). Distribution Adaptive INT8 Quantization for Training CNNs. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 3483-3491. https://doi.org/10.1609/aaai.v35i4.16462

Issue

Section

AAAI Technical Track on Computer Vision III