Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation

Authors

  • Xinrui Chen Shenzhen International Graduate School, Tsinghua University
  • Hongxing Zhang School of Information Science and Technology, Guangdong University of Foreign Studies
  • Fanyi Zeng Shenzhen International Graduate School, Tsinghua University
  • Yongxian Wei Shenzhen International Graduate School, Tsinghua University
  • Yizhi Wang Shenzhen International Graduate School, Tsinghua University
  • Xitong Ling Shenzhen International Graduate School, Tsinghua University
  • Guanghao Li Shenzhen International Graduate School, Tsinghua University
  • Chun Yuan Shenzhen International Graduate School, Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v40i24.39120

Abstract

Layer pruning is a viable technique for compressing large language models while achieving acceleration proportional to the pruning ratio. In this work, we identify that removing any layer induces a magnitude gap in hidden states, and demonstrate that a simple compensation operation leads to superior performance in iterative layer pruning. This key observation motivates us to propose Prune&Comp, a novel, plug-and-play iterative layer pruning scheme that leverages magnitude compensation to mitigate such gaps in a training-free manner. Specifically, we first estimate the magnitude gap of layer removal and then eliminate it by rescaling the remaining weights offline. We further demonstrate the advantages of Prune&Comp in improving the stability of iterative pruning. When integrated with an iterative prune-and-compensate loop, Prune&Comp consistently enhances existing layer pruning metrics. For instance, when 5 layers of LLaMA-3-8B are pruned with the prevalent Taylor+ metric, Prune&Comp reduces PPL from 512.78 to 16.34 and retains 90.57% of the original performance across 9 question-answering tasks, outperforming the baseline by 24.72%.

Published

2026-03-14

How to Cite

Chen, X., Zhang, H., Zeng, F., Wei, Y., Wang, Y., Ling, X., Li, G., & Yuan, C. (2026). Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(24), 20316-20324. https://doi.org/10.1609/aaai.v40i24.39120

Issue

Section

AAAI Technical Track on Machine Learning I