Non-autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Authors

  • Chenyang Huang University of Alberta Bytedance AI Lab
  • Hao Zhou Bytedance AI Lab
  • Osmar R. Zaïane University of Alberta
  • Lili Mou University of Alberta
  • Lei Li University of California Santa Barbara

DOI:

https://doi.org/10.1609/aaai.v36i10.21323

Keywords:

Speech & Natural Language Processing (SNLP)

Abstract

How do we perform efficient inference while retaining high translation quality? Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient. Recent non-autoregressive translation models speed up the inference, but their quality is still inferior. In this work, we propose DSLP, a highly efficient and high-performance model for machine translation. The key insight is to train a non-autoregressive Transformer with Deep Supervision and feed additional Layer-wise Predictions. We conducted extensive experiments on four translation tasks (both directions of WMT'14 EN-DE and WMT'16 EN-RO). Results show that our approach consistently improves the BLEU scores compared with respective base models. Specifically, our best variant outperforms the autoregressive model on three translation tasks, while being 14.8 times more efficient in inference.

Downloads

Published

2022-06-28

How to Cite

Huang, C., Zhou, H., Zaïane, O. R., Mou, L., & Li, L. (2022). Non-autoregressive Translation with Layer-Wise Prediction and Deep Supervision. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 10776-10784. https://doi.org/10.1609/aaai.v36i10.21323

Issue

Section

AAAI Technical Track on Speech and Natural Language Processing