Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition

Authors

  • Eungbeom Kim Seoul National University
  • Kyogu Lee Seoul National University

DOI:

https://doi.org/10.1609/aaai.v39i23.34610

Abstract

Uncertainty estimation has been widely applied for trustworthy automatic speech recognition (ASR) systems across training and inference stages. In the training stage, previous studies show that uncertainty can facilitate self-training by filtering out unlabeled data samples with high uncertainty. However, the current sequence-level uncertainty estimation method for connectionist temporal classification (CTC) based ASR models drops the output probability information and depends only on the textual distance of decoded predictions. In this study, we argue that this results in limited performance improvement and propose a novel output probability-based sequence-level uncertainty estimation method. We also categorize uncertainty as pseudo-label uncertainty and in-training uncertainty for the self-training process. Finally, we present uncertainty-aware self-training for CTC-based ASR models and experimentally show the effectiveness of the proposed method compared to the baselines.

Published

2025-04-11

How to Cite

Kim, E., & Lee, K. (2025). Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 39(23), 24330–24338. https://doi.org/10.1609/aaai.v39i23.34610

Issue

Section

AAAI Technical Track on Natural Language Processing II