Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition

Eungbeom Kim; Kyogu Lee

doi:10.1609/aaai.v39i23.34610

Authors

Eungbeom Kim Seoul National University
Kyogu Lee Seoul National University

DOI:

https://doi.org/10.1609/aaai.v39i23.34610

Abstract

Uncertainty estimation has been widely applied for trustworthy automatic speech recognition (ASR) systems across training and inference stages. In the training stage, previous studies show that uncertainty can facilitate self-training by filtering out unlabeled data samples with high uncertainty. However, the current sequence-level uncertainty estimation method for connectionist temporal classification (CTC) based ASR models drops the output probability information and depends only on the textual distance of decoded predictions. In this study, we argue that this results in limited performance improvement and propose a novel output probability-based sequence-level uncertainty estimation method. We also categorize uncertainty as pseudo-label uncertainty and in-training uncertainty for the self-training process. Finally, we present uncertainty-aware self-training for CTC-based ASR models and experimentally show the effectiveness of the proposed method compared to the baselines.

Uncertainty-Aware Self-Training for CTC-Based Automatic Speech Recognition

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information