Towards Fully Sparse Training: Information Restoration with Spatial Similarity

Authors

  • Weixiang Xu NLPR, Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
  • Xiangyu He NLPR, Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
  • Ke Cheng NLPR, Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
  • Peisong Wang NLPR, Institute of Automation, Chinese Academy of Sciences
  • Jian Cheng NLPR, Institute of Automation, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v36i3.20198

Keywords:

Computer Vision (CV)

Abstract

The 2:4 structured sparsity pattern released by NVIDIA Ampere architecture, requiring four consecutive values containing at least two zeros, enables doubling math throughput for matrix multiplications. Recent works mainly focus on inference speedup via 2:4 sparsity while training acceleration has been largely overwhelmed where backpropagation consumes around 70% of the training time. However, unlike inference, training speedup with structured pruning is nontrivial due to the need to maintain the fidelity of gradients and reduce the additional overhead of performing 2:4 sparsity online. For the first time, this article proposes fully sparse training (FST) where `fully' indicates that ALL matrix multiplications in forward/backward propagation are structurally pruned while maintaining accuracy. To this end, we begin with saliency analysis, investigating the sensitivity of different sparse objects to structured pruning. Based on the observation of spatial similarity among activations, we propose pruning activations with fixed 2:4 masks. Moreover, an Information Restoration block is proposed to retrieve the lost information, which can be implemented by efficient gradient-shift operation. Evaluation of accuracy and efficiency shows that we can achieve 2× training acceleration with negligible accuracy degradation on challenging large-scale classification and detection tasks.

Downloads

Published

2022-06-28

How to Cite

Xu, W., He, X., Cheng, K., Wang, P., & Cheng, J. (2022). Towards Fully Sparse Training: Information Restoration with Spatial Similarity. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 2929-2937. https://doi.org/10.1609/aaai.v36i3.20198

Issue

Section

AAAI Technical Track on Computer Vision III