Improving Simultaneous Machine Translation with Monolingual Data

Hexuan Deng; Liang Ding; Xuebo Liu; Meishan Zhang; Dacheng Tao; Min Zhang

doi:10.1609/aaai.v37i11.26497

Authors

Hexuan Deng Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China
Liang Ding JD Explore Academy
Xuebo Liu Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China
Meishan Zhang Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China
Dacheng Tao JD Explore Academy
Min Zhang Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China

DOI:

https://doi.org/10.1609/aaai.v37i11.26497

Keywords:

SNLP: Machine Translation & Multilinguality

Abstract

Simultaneous machine translation (SiMT) is usually done via sequence-level knowledge distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model. However, there is still a significant performance gap between NMT and SiMT. In this work, we propose to leverage monolingual data to improve SiMT, which trains a SiMT student on the combination of bilingual data and external monolingual data distilled by Seq-KD. Preliminary experiments on En-Zh and En-Ja news domain corpora demonstrate that monolingual data can significantly improve translation quality (e.g., +3.15 BLEU on En-Zh). Inspired by the behavior of human simultaneous interpreters, we propose a novel monolingual sampling strategy for SiMT, considering both chunk length and monotonicity. Experimental results show that our sampling strategy consistently outperforms the random sampling strategy (and other conventional typical NMT monolingual sampling strategies) by avoiding the key problem of SiMT -- hallucination, and has better scalability. We achieve +0.72 BLEU improvements on average against random sampling on En-Zh and En-Ja. Data and codes can be found at https://github.com/hexuandeng/Mono4SiMT.

Improving Simultaneous Machine Translation with Monolingual Data

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription