Divergence-Guided Simultaneous Speech Translation
DOI:
https://doi.org/10.1609/aaai.v38i16.29733Keywords:
NLP: Machine Translation, Multilinguality, Cross-Lingual NLP, NLP: SpeechAbstract
To achieve high-quality translation with low latency, a Simultaneous Speech Translation (SimulST) system relies on a policy module to decide whether to translate immediately or wait for additional streaming input, along with a translation model capable of effectively handling partial speech input. Prior research has tackled these components separately, either using ``wait-k'' policies based on fixed-length segments or detected word boundaries, or dynamic policies based on different strategies (e.g., meaningful units), while employing offline models for prefix-to-prefix translation. In this paper, we propose Divergence-Guided Simultaneous Speech Translation (DiG-SST), a tightly integrated approach focusing on both translation quality and latency for streaming input. Specifically, we introduce a simple yet effective prefix-based strategy for training translation models with partial speech input, and develop an adaptive policy that makes read/write decisions for the translation model based on the expected divergence in translation distributions resulting from future input. Our experiments on multiple translation directions of the MuST-C benchmark demonstrate that our approach achieves a better trade-off between translation quality and latency compared to existing methods.Downloads
Published
2024-03-24
How to Cite
Chen, X., Fan, K., Luo, W., Zhang, L., Zhao, L., Liu, X., & Huang, Z. (2024). Divergence-Guided Simultaneous Speech Translation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 17799-17807. https://doi.org/10.1609/aaai.v38i16.29733
Issue
Section
AAAI Technical Track on Natural Language Processing I