TY - JOUR AU - Liu, Xiao AU - Li, Wenbin AU - Huo, Jing AU - Yao, Lili AU - Gao, Yang PY - 2020/04/03 Y2 - 2024/03/28 TI - Layerwise Sparse Coding for Pruned Deep Neural Networks with Extreme Compression Ratio JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 34 IS - 04 SE - AAAI Technical Track: Machine Learning DO - 10.1609/aaai.v34i04.5927 UR - https://ojs.aaai.org/index.php/AAAI/article/view/5927 SP - 4900-4907 AB - <p>Deep neural network compression is important and increasingly developed especially in resource-constrained environments, such as autonomous drones and wearable devices. Basically, we can easily and largely reduce the number of weights of a trained deep model by adopting a widely used model compression technique, <em>e.g.,</em> pruning. In this way, two kinds of data are usually preserved for this compressed model, <em>i.e.,</em> <em>non-zero weights</em> and <em>meta-data</em>, where meta-data is employed to help encode and decode these non-zero weights. Although we can obtain an ideally small number of non-zero weights through pruning, existing sparse matrix coding methods still need a much larger amount of meta-data (may several times larger than non-zero weights), which will be a severe bottleneck of the deploying of very deep models. To tackle this issue, we propose a <em>layerwise sparse coding (LSC)</em> method to maximize the compression ratio by extremely reducing the amount of meta-data. We first divide a sparse matrix into multiple small blocks and remove zero blocks, and then propose a novel <em>signed relative index</em> (SRI) algorithm to encode the remaining non-zero blocks (with much less meta-data). In addition, the proposed LSC performs parallel matrix multiplication without full decoding, while traditional methods cannot. Through extensive experiments, we demonstrate that LSC achieves substantial gains in pruned DNN compression (<em>e.g.,</em> 51.03x compression ratio on ADMM-Lenet) and inference computation (<em>i.e.,</em> time reduction and extremely less memory bandwidth), over state-of-the-art baselines.</p> ER -