Prior Gradient Mask Guided Pruning-Aware Fine-Tuning

Authors

  • Linhang Cai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
  • Zhulin An Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
  • Chuanguang Yang Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
  • Yangchun Yan Horizon Robotics Inc, Beijing, China
  • Yongjun Xu Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v36i1.19888

Keywords:

Computer Vision (CV), Machine Learning (ML)

Abstract

We proposed a Prior Gradient Mask Guided Pruning-aware Fine-Tuning (PGMPF) framework to accelerate deep Convolutional Neural Networks (CNNs). In detail, the proposed PGMPF selectively suppresses the gradient of those ”unimportant” parameters via a prior gradient mask generated by the pruning criterion during fine-tuning. PGMPF has three charming characteristics over previous works: (1) Pruning-aware network fine-tuning. A typical pruning pipeline consists of training, pruning and fine-tuning, which are relatively independent, while PGMPF utilizes a variant of the pruning mask as a prior gradient mask to guide fine-tuning, without complicated pruning criteria. (2) An excellent tradeoff between large model capacity during fine-tuning and stable convergence speed to obtain the final compact model. Previous works preserve more training information of pruned parameters during fine-tuning to pursue better performance, which would incur catastrophic non-convergence of the pruned model for relatively large pruning rates, while our PGMPF greatly stabilizes the fine-tuning phase by gradually constraining the learning rate of those ”unimportant” parameters. (3) Channel-wise random dropout of the prior gradient mask to impose some gradient noise to fine-tuning to further improve the robustness of final compact model. Experimental results on three image classification benchmarks CIFAR10/ 100 and ILSVRC-2012 demonstrate the effectiveness of our method for various CNN architectures, datasets and pruning rates. Notably, on ILSVRC-2012, PGMPF reduces 53.5% FLOPs on ResNet-50 with only 0.90% top-1 accuracy drop and 0.52% top-5 accuracy drop, which has advanced the state-of-the-art with negligible extra computational cost.

Downloads

Published

2022-06-28

How to Cite

Cai, L., An, Z., Yang, C., Yan, Y., & Xu, Y. (2022). Prior Gradient Mask Guided Pruning-Aware Fine-Tuning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1), 140-148. https://doi.org/10.1609/aaai.v36i1.19888

Issue

Section

AAAI Technical Track on Computer Vision I