TY - JOUR
AU - Liu, Xiao
AU - Li, Wenbin
AU - Huo, Jing
AU - Yao, Lili
AU - Gao, Yang
PY - 2020/04/03
Y2 - 2021/03/03
TI - Layerwise Sparse Coding for Pruned Deep Neural Networks with Extreme Compression Ratio
JF - Proceedings of the AAAI Conference on Artificial Intelligence
JA - AAAI
VL - 34
IS - 04
SE - AAAI Technical Track: Machine Learning
DO - 10.1609/aaai.v34i04.5927
UR - https://ojs.aaai.org/index.php/AAAI/article/view/5927
SP - 4900-4907
AB - <p>Deep neural network compression is important and increasingly developed especially in resource-constrained environments, such as autonomous drones and wearable devices. Basically, we can easily and largely reduce the number of weights of a trained deep model by adopting a widely used model compression technique, <em>e.g.,</em> pruning. In this way, two kinds of data are usually preserved for this compressed model, <em>i.e.,</em> <em>non-zero weights</em> and <em>meta-data</em>, where meta-data is employed to help encode and decode these non-zero weights. Although we can obtain an ideally small number of non-zero weights through pruning, existing sparse matrix coding methods still need a much larger amount of meta-data (may several times larger than non-zero weights), which will be a severe bottleneck of the deploying of very deep models. To tackle this issue, we propose a <em>layerwise sparse coding (LSC)</em> method to maximize the compression ratio by extremely reducing the amount of meta-data. We first divide a sparse matrix into multiple small blocks and remove zero blocks, and then propose a novel <em>signed relative index</em> (SRI) algorithm to encode the remaining non-zero blocks (with much less meta-data). In addition, the proposed LSC performs parallel matrix multiplication without full decoding, while traditional methods cannot. Through extensive experiments, we demonstrate that LSC achieves substantial gains in pruned DNN compression (<em>e.g.,</em> 51.03x compression ratio on ADMM-Lenet) and inference computation (<em>i.e.,</em> time reduction and extremely less memory bandwidth), over state-of-the-art baselines.</p>
ER -