Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)
DOI:
https://doi.org/10.1609/aaai.v26i2.18962Abstract
This article describes a new automatic transcription system in the Japanese Parliament which deploys our automatic speech recognition (ASR) technology. To achieve high recognition performance in spontaneous meeting speech, we have investigated an efficient training scheme with minimal supervision which can exploit a huge amount of real data. Specifically, we have proposed a lightly-supervised training scheme based on statistical language model transformation, which fills the gap between faithful transcripts of spoken utterances and final texts for documentation. Once this mapping is trained, we no longer need faithful transcripts for training both acoustic and language models. Instead, we can fully exploit the speech and text data available in Parliament as they are. This scheme also realizes a sustainable ASR system which evolves, i.e. update/re-train the models, only with speech and text generated during the system operation. The ASR system has been deployed in the Japanese Parliament since 2010, and consistently achieved character accuracy of nearly 90%, which is useful for streamlining the transcription process.