Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification
DOI:
https://doi.org/10.1609/aaai.v38i10.29015Keywords:
ML: Time-Series/Data Streams, ML: Kernel MethodsAbstract
With the rapid growth of audio data, there's a pressing need for automatic audio classification. As a type of time-series data, audio exhibits waveform fluctuations in both the time and frequency domains that evolve over time, with similar instances sharing consistent patterns. This study introduces the Audio Scanning Network (ASNet), designed to leverage abundant information for achieving stable and effective audio classification. ASNet captures real-time changes in audio waveforms across both time and frequency domains through reservoir computing, supported by Reservoir Kernel Canonical Correlation Analysis (RKCCA) to explore correlations between time-domain and frequency-domain waveform fluctuations. This innovative approach empowers ASNet to comprehensively capture the changes and inherent correlations within the audio waveform, and without the need for time-consuming iterative training. Instead of converting audio into spectrograms, ASNet directly utilizes audio feature sequences to uncover associations between time and frequency fluctuations. Experiments on environmental sound and music genre classification tasks demonstrate ASNet's comparable performance to state-of-the-art methods.Downloads
Published
2024-03-24
How to Cite
Chen, L., Zhou, X., & Chen, H. (2024). Audio Scanning Network: Bridging Time and Frequency Domains for Audio Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 38(10), 11355-11363. https://doi.org/10.1609/aaai.v38i10.29015
Issue
Section
AAAI Technical Track on Machine Learning I