BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

Authors

  • Cunhang Fan Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, China
  • Enrui Liu Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, China
  • Andong Li Key Laboratory of Noise and Vibration Research, Institute of Acoustics Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China.
  • Jianhua Tao Department of Automation, Tsinghua University, Beijing, China
  • Jian Zhou Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, China
  • Jiahao Li Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, China
  • Chengshi Zheng Key Laboratory of Noise and Vibration Research, Institute of Acoustics Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China.
  • Zhao Lv Anhui Province Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, Hefei, China

DOI:

https://doi.org/10.1609/aaai.v39i22.34557

Abstract

Although the complex spectrum-based speech enhancement (SE) methods have achieved significant performance, coupling amplitude and phase can lead to a compensation effect, where amplitude information is sacrificed to compensate for the phase that is harmful to SE. In addition, to further improve the performance of SE, many modules are stacked onto SE, resulting in increased model complexity that limits the application of SE. To address these problems, we proposed a dual-path network based on compressed frequency using Mamba. First, we extract amplitude and phase information through parallel dual branches. This approach leverages structured complex spectra to implicitly capture phase information and solves the compensation effect by decoupling amplitude and phase, and the network incorporates an interaction module to suppress unnecessary parts and recover missing components from the other branch. Second, to reduce network complexity, the network introduces a band-split strategy to compress the frequency dimension. To further reduce complexity while maintaining good performance, we designed a Mamba-based module that models the time and frequency dimensions under linear complexity. Finally, compared to baselines, our model achieves an average 8.3 times reduction in computational complexity while maintaining superior performance. Furthermore, it achieves a 25 times reduction in complexity compared to transformer-based models.

Published

2025-04-11

How to Cite

Fan, C., Liu, E., Li, A., Tao, J., Zhou, J., Li, J., … Lv, Z. (2025). BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement. Proceedings of the AAAI Conference on Artificial Intelligence, 39(22), 23850–23858. https://doi.org/10.1609/aaai.v39i22.34557

Issue

Section

AAAI Technical Track on Natural Language Processing I