Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition
DOI:
https://doi.org/10.1609/aaai.v39i23.34610Abstract
Uncertainty estimation has been widely applied for trustworthy automatic speech recognition (ASR) systems across training and inference stages. In the training stage, previous studies show that uncertainty can facilitate self-training by filtering out unlabeled data samples with high uncertainty. However, the current sequence-level uncertainty estimation method for connectionist temporal classification (CTC) based ASR models drops the output probability information and depends only on the textual distance of decoded predictions. In this study, we argue that this results in limited performance improvement and propose a novel output probability-based sequence-level uncertainty estimation method. We also categorize uncertainty as pseudo-label uncertainty and in-training uncertainty for the self-training process. Finally, we present uncertainty-aware self-training for CTC-based ASR models and experimentally show the effectiveness of the proposed method compared to the baselines.Downloads
Published
2025-04-11
How to Cite
Kim, E., & Lee, K. (2025). Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 39(23), 24330–24338. https://doi.org/10.1609/aaai.v39i23.34610
Issue
Section
AAAI Technical Track on Natural Language Processing II