Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification

Liangwei Chen; Xiren Zhou; Huanhuan Chen

doi:10.1609/aaai.v38i10.29015

Authors

Liangwei Chen School of Data Science, University of Science and Technology of China
Xiren Zhou School of Computer Science and Technology, University of Science and Technology of China
Huanhuan Chen School of Computer Science and Technology, University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v38i10.29015

Keywords:

ML: Time-Series/Data Streams, ML: Kernel Methods

Abstract

With the rapid growth of audio data, there's a pressing need for automatic audio classification. As a type of time-series data, audio exhibits waveform fluctuations in both the time and frequency domains that evolve over time, with similar instances sharing consistent patterns. This study introduces the Audio Scanning Network (ASNet), designed to leverage abundant information for achieving stable and effective audio classification. ASNet captures real-time changes in audio waveforms across both time and frequency domains through reservoir computing, supported by Reservoir Kernel Canonical Correlation Analysis (RKCCA) to explore correlations between time-domain and frequency-domain waveform fluctuations. This innovative approach empowers ASNet to comprehensively capture the changes and inherent correlations within the audio waveform, and without the need for time-consuming iterative training. Instead of converting audio into spectrograms, ASNet directly utilizes audio feature sequences to uncover associations between time and frequency fluctuations. Experiments on environmental sound and music genre classification tasks demonstrate ASNet's comparable performance to state-of-the-art methods.

Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription