Deep Representation-Decoupling Neural Networks for Monaural Music Mixture Separation

Zhuo Li; Hongwei Wang; Miao Zhao; Wenjie Li; Minyi Guo

doi:10.1609/aaai.v32i1.11300

Authors

Zhuo Li The Hong Kong Polytechnic University
Hongwei Wang Shanghai Jiao Tong University
Miao Zhao The Hong Kong Polytechnic University
Wenjie Li The Hong Kong Polytechnic University
Minyi Guo Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v32i1.11300

Abstract

Monaural source separation (MSS) aims to extract and reconstruct different sources from a single-channel mixture, which could facilitate a variety of applications such as chord recognition, pitch estimation and automatic transcription. In this paper, we study the problem of separating vocals and instruments from monaural music mixture. Existing works for monaural source separation either utilize linear and shallow models (e.g., non-negative matrix factorization), or do not explicitly address the coupling and tangling of multiple sources in original input signals, hence they do not perform satisfactorily in real-world scenarios. To overcome the above limitations, we propose a novel end-to-end framework for monaural music mixture separation called Deep Representation-Decoupling Neural Networks (DRDNN). DRDNN takes advantages of both traditional signal processing methods and popular deep learning models. For each input of music mixture, DRDNN converts it to a two-dimensional time-frequency spectrogram using short-time Fourier transform (STFT), followed by stacked convolutional neural networks (CNN) layers and long-short term memory (LSTM) layers to extract more condensed features. Afterwards, DRDNN utilizes a decoupling component, which consists of a group of multi-layer perceptrons (MLP), to decouple the features further into different separated sources. The design of decoupling component in DRDNN produces purified single-source signals for subsequent full-size restoration, and can significantly improve the performance of final separation. Through extensive experiments on real-world dataset, we prove that DRDNN outperforms state-of-the-art baselines in the task of monaural music mixture separation and reconstruction.

Deep Representation-Decoupling Neural Networks for Monaural Music Mixture Separation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription