Towards Building a Language-Independent Speech Scoring Assessment

Authors

  • Shreyansh Gupta SHL Labs, Gurugram, India
  • Abhishek Unnam SHL Labs, Gurugram, India
  • Kuldeep Yadav SHL Labs, Gurugram, India
  • Varun Aggarwal SHL Labs, Gurugram, India

DOI:

https://doi.org/10.1609/aaai.v38i21.30366

Keywords:

Speech-scoring, Wav2vec, Multilingual, Language-independent

Abstract

Automatic speech scoring is crucial in language learning, providing targeted feedback to language learners by assessing pronunciation, fluency, and other speech qualities. However, the scarcity of human-labeled data for languages beyond English poses a significant challenge in developing such systems. In this work, we propose a Language-Independent scoring approach to evaluate speech without relying on labeled data in the target language. We introduce a multilingual speech scoring system that leverages representations from the wav2vec 2.0 XLSR model and a force-alignment technique based on CTC-Segmentation to construct speech features. These features are used to train a machine learning model to predict pronunciation and fluency scores. We demonstrate the potential of our method by predicting expert ratings on a speech dataset spanning five languages - English, French, Spanish, German and Portuguese, and comparing its performance against Language-Specific models trained individually on each language, as well as a jointly-trained model on all languages. Results indicate that our approach shows promise as an initial step towards a universal language independent speech scoring.

Downloads

Published

2024-03-24

How to Cite

Gupta, S., Unnam, A., Yadav, K., & Aggarwal, V. (2024). Towards Building a Language-Independent Speech Scoring Assessment. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23200-23206. https://doi.org/10.1609/aaai.v38i21.30366