Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation
DOI:
https://doi.org/10.1609/aaai.v40i24.39120Abstract
Layer pruning is a viable technique for compressing large language models while achieving acceleration proportional to the pruning ratio. In this work, we identify that removing any layer induces a magnitude gap in hidden states, and demonstrate that a simple compensation operation leads to superior performance in iterative layer pruning. This key observation motivates us to propose Prune&Comp, a novel, plug-and-play iterative layer pruning scheme that leverages magnitude compensation to mitigate such gaps in a training-free manner. Specifically, we first estimate the magnitude gap of layer removal and then eliminate it by rescaling the remaining weights offline. We further demonstrate the advantages of Prune&Comp in improving the stability of iterative pruning. When integrated with an iterative prune-and-compensate loop, Prune&Comp consistently enhances existing layer pruning metrics. For instance, when 5 layers of LLaMA-3-8B are pruned with the prevalent Taylor+ metric, Prune&Comp reduces PPL from 512.78 to 16.34 and retains 90.57% of the original performance across 9 question-answering tasks, outperforming the baseline by 24.72%.Downloads
Published
2026-03-14
How to Cite
Chen, X., Zhang, H., Zeng, F., Wei, Y., Wang, Y., Ling, X., Li, G., & Yuan, C. (2026). Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(24), 20316-20324. https://doi.org/10.1609/aaai.v40i24.39120
Issue
Section
AAAI Technical Track on Machine Learning I