SEAP: Sparse Expert Activation Pruning Unlocks the Brainpower of Large Language Models

Authors

  • Xun Liang Renmin University of China
  • Hanyu Wang Renmin University of China
  • Huayi Lai Renmin University of China
  • Simin Niu Renmin University of China
  • Shichao Song Renmin University of China
  • Jiawei Yang Renmin University of China
  • Jihao Zhao Renmin University of China
  • Feiyu Xiong Institute for Advanced Algorithms Research (Shanghai) MemTensor (Shanghai) Technology Co., Ltd.
  • Bo Tang Institute for Advanced Algorithms Research (Shanghai) MemTensor (Shanghai) Technology Co., Ltd.
  • Zhiyu Li Institute for Advanced Algorithms Research (Shanghai) MemTensor (Shanghai) Technology Co., Ltd.

DOI:

https://doi.org/10.1609/aaai.v40i38.40463

Abstract

Pruning is a promising approach to reduce the high inference cost of large language models (LLMs), but it often comes at the expense of performance. Motivated by the "functional localization" theory in neuroscience, we hypothesize that LLMs contain task-specific expert activation paths, where specific subsets of neurons are co-activated for particular tasks. This structure allows selective activation to preserve task performance while improving inference efficiency. We introduce Sparse Expert Activation Pruning (SEAP), a training-free pruning method for large language models. SEAP identifies task-relevant activation paths by analyzing the clustering patterns of hidden states and neuron activations on a multi-task calibration dataset. Cross-task transfer evaluations confirm the existence of such expert activation structures. SEAP constructs task-aware pruning masks by leveraging a task-expert calibration dataset, which provides representative samples across diverse tasks to reveal their activation signatures. It then employs a lightweight task router to dynamically select relevant computation paths based on the input task. This design significantly reduces inference cost without compromising accuracy. Experimental results show that SEAP retains model performance with only a 1.5% drop on most tasks at 20% sparsity, and at 50% sparsity, it surpasses strong pruning baselines such as WandA and FLAP by over 20%. These results highlight SEAP as a scalable and effective solution for efficient LLM inference.

Downloads

Published

2026-03-14

How to Cite

Liang, X., Wang, H., Lai, H., Niu, S., Song, S., Yang, J., … Li, Z. (2026). SEAP: Sparse Expert Activation Pruning Unlocks the Brainpower of Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 31934–31942. https://doi.org/10.1609/aaai.v40i38.40463

Issue

Section

AAAI Technical Track on Natural Language Processing III