Efficient Plug-and-Play Weight Refinement for Sparse Large Models

Authors

  • Jingcheng Xie University of Science and Technology of China
  • Yinda Chen University of Science and Technology of China
  • Xiaoyu Liu University of Science and Technology of China
  • Yinglong Li University of Science and Technology of China
  • Haoyuan Shi University of Science and Technology of China
  • Zhiwei Xiong University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i32.39922

Abstract

One-shot pruning efficiently compresses Large Language Models but produces coarse sparse weights, causing significant performance degradation. Traditional fine-tuning approaches to refine these weights are prohibitively expensive for large models. This highlights the need for a training-free weight refinement method that works seamlessly with one-shot pruning and can efficiently recover the lost performance. To tackle this problem, we propose Efficient Iterative Weight Refinement (EIWR), a lightweight, plug-and-play, and training-free method that refines pruned weights through layer-wise iterative optimization. EIWR achieves efficient weight refinement via three key components: a Global Soft Constraint that eliminates costly row-wise Hessian inversions and expands the solution space; a Historical Momentum Strategy that leverages one-shot pruning priors to accelerate convergence and enhance final performance; and Neumann Series Extrapolation that significantly speeds up per-iteration computation. As a result, EIWR enables effective weight refinement with minimal time and memory overhead. Extensive experiments on LLaMA2/3 and Qwen under different pruning strategies and sparsity levels demonstrate that our method can efficiently refine sparse weights and mitigate performance degradation. For example, on LLaMA2-7B under 70 percent sparsity, EIWR reduces perplexity by 15 percent compared with SparseGPT on the WikiText2 benchmark, with only 1.81 additional minutes of computation and 1GB of additional memory.

Downloads

Published

2026-03-14

How to Cite

Xie, J., Chen, Y., Liu, X., Li, Y., Shi, H., & Xiong, Z. (2026). Efficient Plug-and-Play Weight Refinement for Sparse Large Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(32), 27081–27089. https://doi.org/10.1609/aaai.v40i32.39922

Issue

Section

AAAI Technical Track on Machine Learning IX