D2 Prune: Sparsifying Large Language Models via Dual Taylor Expansion and Attention Distribution Awareness

Authors

  • Lang Xiong Chongqing University
  • Ning Liu Beijing Innovation Center of Humanoid Robotics
  • Ao Ren Chongqing University
  • Yuheng Bai Chongqing University
  • Haining Fang Chongqing University
  • Binyan Zhang Chongqing University
  • Zhe Jiang Chongqing University
  • Yujuan Tan National University of Defense Technology
  • Duo Liu Chongqing University

DOI:

https://doi.org/10.1609/aaai.v40i32.39932

Abstract

Large language models (LLMs) face significant deployment challenges due to their massive computational demands. While pruning offers a promising compression solution, existing methods suffer from two critical limitations: (1) They neglect activation distribution shifts between calibration data and test data, resulting in inaccurate error estimations; (2) Overlooking the long-tail distribution characteristics of activations in the attention module. To address these limitations, this paper proposes a novel pruning method, D²Prune. First, we propose a dual Taylor expansion-based method that jointly models weight and activation perturbations for precise error estimation, leading to precise pruning mask selection and weight updating and facilitating error minimization during pruning. Second, we propose an attention-aware dynamic update strategy that preserves the long-tail attention pattern by jointly minimizing the KL divergence of attention distributions and the reconstruction error. Extensive experiments show that D²Prune consistently outperforms SOTA methods across various LLMs (e.g., OPT-125M, LLaMA2/3, Qwen3). Moreover, the dynamic attention update mechanism also generalizes well to ViT-based vision models like DeiT, achieving superior accuracy on ImageNet-1K.

Downloads

Published

2026-03-14

How to Cite

Xiong, L., Liu, N., Ren, A., Bai, Y., Fang, H., Zhang, B., … Liu, D. (2026). D2 Prune: Sparsifying Large Language Models via Dual Taylor Expansion and Attention Distribution Awareness. Proceedings of the AAAI Conference on Artificial Intelligence, 40(32), 27171–27179. https://doi.org/10.1609/aaai.v40i32.39932

Issue

Section

AAAI Technical Track on Machine Learning IX