Positions, Channels, and Layers: Fully Generalized Non-Local Network for Singer Identification

Authors

  • I-Yuan Kuo Academia Sinica, Taiwan
  • Wen-Li Wei Academia Sinica, Taiwan
  • Jen-Chun Lin Academia Sinica, Taiwan

Keywords:

Classification and Regression

Abstract

Recently, a non-local (NL) operation has been designed as the central building block for deep-net models to capture long-range dependencies (Wang et al. 2018). Despite its excellent performance, it does not consider the interaction between positions across channels and layers, which is crucial in fine-grained classification tasks. To address the limitation, we target at singer identification (SID) task and present a fully generalized non-local (FGNL) module to help identify fine-grained vocals. Specifically, we first propose a FGNL operation, which extends the NL operation to explore the correlations between positions across channels and layers. Secondly, we further apply a depth-wise convolution with Gaussian kernel in the FGNL operation to smooth feature maps for better generalization. More, we modify the squeeze-and-excitation (SE) scheme into the FGNL module to adaptively emphasize correlated feature channels to help uncover relevant feature responses and eventually the target singer. Evaluating results on the benchmark artist20 dataset shows that the FGNL module significantly improves the accuracy of the deep-net models in SID. Codes are available at https://github.com/ian-k-1217/Fully-Generalized-Non-Local-Network.

Downloads

Published

2021-05-18

How to Cite

Kuo, I.-Y., Wei, W.-L., & Lin, J.-C. . (2021). Positions, Channels, and Layers: Fully Generalized Non-Local Network for Singer Identification. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 8217-8225. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17000

Issue

Section

AAAI Technical Track on Machine Learning II