Positions, Channels, and Layers: Fully Generalized Non-Local Network for Singer Identification
Keywords:Classification and Regression
AbstractRecently, a non-local (NL) operation has been designed as the central building block for deep-net models to capture long-range dependencies (Wang et al. 2018). Despite its excellent performance, it does not consider the interaction between positions across channels and layers, which is crucial in fine-grained classification tasks. To address the limitation, we target at singer identification (SID) task and present a fully generalized non-local (FGNL) module to help identify fine-grained vocals. Specifically, we first propose a FGNL operation, which extends the NL operation to explore the correlations between positions across channels and layers. Secondly, we further apply a depth-wise convolution with Gaussian kernel in the FGNL operation to smooth feature maps for better generalization. More, we modify the squeeze-and-excitation (SE) scheme into the FGNL module to adaptively emphasize correlated feature channels to help uncover relevant feature responses and eventually the target singer. Evaluating results on the benchmark artist20 dataset shows that the FGNL module significantly improves the accuracy of the deep-net models in SID. Codes are available at https://github.com/ian-k-1217/Fully-Generalized-Non-Local-Network.
How to Cite
Kuo, I.-Y., Wei, W.-L., & Lin, J.-C. . (2021). Positions, Channels, and Layers: Fully Generalized Non-Local Network for Singer Identification. Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 8217-8225. https://doi.org/10.1609/aaai.v35i9.17000
AAAI Technical Track on Machine Learning II