Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation

Xinrui Chen; Hongxing Zhang; Fanyi Zeng; Yongxian Wei; Yizhi Wang; Xitong Ling; Guanghao Li; Chun Yuan

doi:10.1609/aaai.v40i24.39120

Authors

Xinrui Chen Shenzhen International Graduate School, Tsinghua University
Hongxing Zhang School of Information Science and Technology, Guangdong University of Foreign Studies
Fanyi Zeng Shenzhen International Graduate School, Tsinghua University
Yongxian Wei Shenzhen International Graduate School, Tsinghua University
Yizhi Wang Shenzhen International Graduate School, Tsinghua University
Xitong Ling Shenzhen International Graduate School, Tsinghua University
Guanghao Li Shenzhen International Graduate School, Tsinghua University
Chun Yuan Shenzhen International Graduate School, Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v40i24.39120

Abstract

Layer pruning is a viable technique for compressing large language models while achieving acceleration proportional to the pruning ratio. In this work, we identify that removing any layer induces a magnitude gap in hidden states, and demonstrate that a simple compensation operation leads to superior performance in iterative layer pruning. This key observation motivates us to propose Prune&Comp, a novel, plug-and-play iterative layer pruning scheme that leverages magnitude compensation to mitigate such gaps in a training-free manner. Specifically, we first estimate the magnitude gap of layer removal and then eliminate it by rescaling the remaining weights offline. We further demonstrate the advantages of Prune&Comp in improving the stability of iterative pruning. When integrated with an iterative prune-and-compensate loop, Prune&Comp consistently enhances existing layer pruning metrics. For instance, when 5 layers of LLaMA-3-8B are pruned with the prevalent Taylor+ metric, Prune&Comp reduces PPL from 512.78 to 16.34 and retains 90.57% of the original performance across 9 question-answering tasks, outperforming the baseline by 24.72%.

Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information