SepPrune: Structured Pruning for Efficient Deep Speech Separation

Authors

  • Yuqi Li The City College of New York, CUNY
  • Kai Li Tsinghua University
  • Xin Yin Zhejiang University
  • Zhifei Yang Peking University
  • Zeyu Dong Boston University
  • Zhengtao Yao University of Southern California
  • Haoyan Xu University of Southern California
  • Yingli Tian The City College of New York, CUNY
  • Yao Lu Institute of Cyberspace Security, Zhejiang University of Technology CFAR, Agency for Science, Technology and Research (A*STAR)

DOI:

https://doi.org/10.1609/aaai.v40i38.40455

Abstract

Although deep learning has substantially advanced speech separation in recent years, most existing studies continue to prioritize separation quality while overlooking computational efficiency, an essential factor for low-latency speech processing in real-time applications. In this paper, we propose SepPrune, the first structured pruning framework specifically designed to compress deep speech separation models and reduce their computational cost. SepPrune begins by analyzing the computational structure of a given model to identify layers with the highest computational burden. It then introduces a differentiable masking strategy to enable gradient-driven channel selection. Based on the learned masks, SepPrune prunes redundant channels and fine-tunes the remaining parameters to recover performance. Extensive experiments demonstrate that this learnable pruning paradigm yields substantial advantages for channel pruning in speech separation models, outperforming existing methods. Notably, a model pruned with SepPrune can recover 85% of the performance of a pre-trained model (trained over hundreds of epochs) with only one epoch of fine-tuning, and achieves convergence 36x faster than training from scratch.

Downloads

Published

2026-03-14

How to Cite

Li, Y., Li, K., Yin, X., Yang, Z., Dong, Z., Yao, Z., … Lu, Y. (2026). SepPrune: Structured Pruning for Efficient Deep Speech Separation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 31861–31869. https://doi.org/10.1609/aaai.v40i38.40455

Issue

Section

AAAI Technical Track on Natural Language Processing III