Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement

Authors

  • Xinmeng Xu National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, China
  • Weiping Tu National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, China Hubei Luojia Laboratory, China
  • Yuhong Yang National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, China Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, China

DOI:

https://doi.org/10.1609/aaai.v37i11.26622

Keywords:

SNLP: Speech and Multimodality, ML: Reinforcement Learning Algorithms

Abstract

Attention mechanisms, such as local and non-local attention, play a fundamental role in recent deep learning based speech enhancement (SE) systems. However, a natural speech contains many fast-changing and relatively briefly acoustic events, therefore, capturing the most informative speech features by indiscriminately using local and non-local attention is challenged. We observe that the noise type and speech feature vary within a sequence of speech and the local and non-local can respectively process different types of corrupted speech regions. To leverage this, we propose Selector-Enhancer, a dual-attention based convolution neural network (CNN) with a feature-filter that can dynamically select regions from low-resolution speech features and feed them to local or non-local attention operations. In particular, the proposed feature-filter is trained by using reinforcement learning (RL) with a developed difficulty-regulated reward that related to network performance, model complexity and “the difficulty of the SE task”. The results show that our method achieves comparable or superior performance to existing approaches. In particular, Selector-Enhancer is effective for real-world denoising, where the number and types of noise are varies on a single noisy mixture.

Downloads

Published

2023-06-26

How to Cite

Xu, X., Tu, W., & Yang, Y. (2023). Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 13853-13860. https://doi.org/10.1609/aaai.v37i11.26622

Issue

Section

AAAI Technical Track on Speech & Natural Language Processing