Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification

Authors

  • Liangwei Chen School of Data Science, University of Science and Technology of China
  • Xiren Zhou School of Computer Science and Technology, University of Science and Technology of China
  • Huanhuan Chen School of Computer Science and Technology, University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v38i10.29015

Keywords:

ML: Time-Series/Data Streams, ML: Kernel Methods

Abstract

With the rapid growth of audio data, there's a pressing need for automatic audio classification. As a type of time-series data, audio exhibits waveform fluctuations in both the time and frequency domains that evolve over time, with similar instances sharing consistent patterns. This study introduces the Audio Scanning Network (ASNet), designed to leverage abundant information for achieving stable and effective audio classification. ASNet captures real-time changes in audio waveforms across both time and frequency domains through reservoir computing, supported by Reservoir Kernel Canonical Correlation Analysis (RKCCA) to explore correlations between time-domain and frequency-domain waveform fluctuations. This innovative approach empowers ASNet to comprehensively capture the changes and inherent correlations within the audio waveform, and without the need for time-consuming iterative training. Instead of converting audio into spectrograms, ASNet directly utilizes audio feature sequences to uncover associations between time and frequency fluctuations. Experiments on environmental sound and music genre classification tasks demonstrate ASNet's comparable performance to state-of-the-art methods.

Published

2024-03-24

How to Cite

Chen, L., Zhou, X., & Chen, H. (2024). Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 38(10), 11355-11363. https://doi.org/10.1609/aaai.v38i10.29015

Issue

Section

AAAI Technical Track on Machine Learning I