Sound Check: Auditing Recent Audio Dataset Practices

Authors

  • William Agnew Carnegie Mellon University
  • Julia Barnett Northwestern University
  • Annie Chu Northwestern University
  • Rachel Hong University of Washington
  • Michael Feffer Carnegie Mellon University
  • Robin Netzorg University of California Berkeley
  • Harry H. Jiang Carnegie Mellon University
  • Ezra Awumey Carnegie Mellon University
  • Sauvik Das Carnegie Mellon University

DOI:

https://doi.org/10.1609/aies.v8i1.36528

Abstract

Audio AI models are increasingly used for a broad range of applications including music and sound generation, text-to- speech (TTS), voice cloning, emotion analysis, transcription, and audio classification. However, we have little understanding of the datasets used to create audio AI models, a gap that leaves the field without a powerful tool for understanding potential biases, toxicity, copyright violations, and other ethical and performance issues. We conduct a mapping literature review of hundreds of audio datasets used in recent music, sound, and speech AI papers. We first assess the sourcing, size, and usage of these datasets, finding that while there are hundreds of audio datasets, few are widely used. Next, we identify nine representative datasets and conduct several analyses to understand bias, toxicity, representation, and quality. We find that these datasets are often biased against women, have stereotypes about marginalized communities, and contain significant amounts of copyrighted work. We also find that audio datasets often come with scant documentation. To address this gap, we extend Gebru’s datasheets for datasets to audio data, providing domain-specific documentation guidance. Finally, to facilitate public exploration of dataset contents and accountability, we developed an audio datasets exploration web tool which is available below in our links, along with our code and an extended version of our work including the appendix and augmented datasheets for datasets. Content warning: this paper contains discussions of offensive language.

Downloads

Published

2025-10-15

How to Cite

Agnew, W., Barnett, J., Chu, A., Hong, R., Feffer, M., Netzorg, R., Jiang, H. H., Awumey, E., & Das, S. (2025). Sound Check: Auditing Recent Audio Dataset Practices. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8(1), 26-40. https://doi.org/10.1609/aies.v8i1.36528