“Allot?” is “A Lot!” Towards Developing More Generalized Speech Recognition System for Accessible Communication


  • Grisha Bandodkar University of California, Davis
  • Shyam Agarwal University of California, Davis
  • Athul Krishna Sughosh University of California, Davis
  • Sahilbir Singh University of California, Davis
  • Taeyeong Choi Kennesaw State University




Deep Learning, Machine Learning, Automatic Speech Recognition, Audio And Speech Processing, Wav2vec 2.0, Sound, Computation And Language, Data Augmentation, Accented Speech


The proliferation of Automatic Speech Recognition (ASR) systems has revolutionized translation and transcription. However, challenges persist in ensuring inclusive communication for non-native English speakers. This study quantifies the gap between accented and native English speech using Wav2Vec 2.0, a state-of-the-art transformer model. Notably, we found that accented speech exhibits significantly higher word error rates of 30-50%, in contrast to native speakers’ 2-8% (Baevski et al. 2020). Our exploration extends to leveraging accessible online datasets to highlight the potential of enhancing speech recognition by fine-tuning the Wav2Vec 2.0 model. Through experimentation and analysis, we highlight the challenges with training models on accented speech. By refining models and addressing data quality issues, our work presents a pipeline for future investigations aimed at developing an integrated system capable of effectively engaging with a broader range of individuals with diverse backgrounds. Accurate recognition of accented speech is a pivotal step toward democratizing AI-driven communication products.




How to Cite

Bandodkar, G., Agarwal, S., Sughosh, A. K., Singh, S., & Choi, T. (2024). “Allot?” is “A Lot!” Towards Developing More Generalized Speech Recognition System for Accessible Communication. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23327-23334. https://doi.org/10.1609/aaai.v38i21.30381



EAAI: Mentored Undergraduate Research Challenge: AI for Accessibility in Comm