Prior Gradient Mask Guided Pruning-Aware Fine-Tuning

Linhang Cai; Zhulin An; Chuanguang Yang; Yangchun Yan; Yongjun Xu

doi:10.1609/aaai.v36i1.19888

Authors

Linhang Cai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
Zhulin An Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Chuanguang Yang Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China
Yangchun Yan Horizon Robotics Inc, Beijing, China
Yongjun Xu Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v36i1.19888

Keywords:

Computer Vision (CV), Machine Learning (ML)

Abstract

We proposed a Prior Gradient Mask Guided Pruning-aware Fine-Tuning (PGMPF) framework to accelerate deep Convolutional Neural Networks (CNNs). In detail, the proposed PGMPF selectively suppresses the gradient of those ”unimportant” parameters via a prior gradient mask generated by the pruning criterion during fine-tuning. PGMPF has three charming characteristics over previous works: (1) Pruning-aware network fine-tuning. A typical pruning pipeline consists of training, pruning and fine-tuning, which are relatively independent, while PGMPF utilizes a variant of the pruning mask as a prior gradient mask to guide fine-tuning, without complicated pruning criteria. (2) An excellent tradeoff between large model capacity during fine-tuning and stable convergence speed to obtain the final compact model. Previous works preserve more training information of pruned parameters during fine-tuning to pursue better performance, which would incur catastrophic non-convergence of the pruned model for relatively large pruning rates, while our PGMPF greatly stabilizes the fine-tuning phase by gradually constraining the learning rate of those ”unimportant” parameters. (3) Channel-wise random dropout of the prior gradient mask to impose some gradient noise to fine-tuning to further improve the robustness of final compact model. Experimental results on three image classification benchmarks CIFAR10/ 100 and ILSVRC-2012 demonstrate the effectiveness of our method for various CNN architectures, datasets and pruning rates. Notably, on ILSVRC-2012, PGMPF reduces 53.5% FLOPs on ResNet-50 with only 0.90% top-1 accuracy drop and 0.52% top-5 accuracy drop, which has advanced the state-of-the-art with negligible extra computational cost.

Prior Gradient Mask Guided Pruning-Aware Fine-Tuning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription