Learning Light-Weight Translation Models from Deep Transformer

Bei Li; Ziyang Wang; Hui Liu; Quan Du; Tong Xiao; Chunliang Zhang; Jingbo Zhu

doi:10.1609/aaai.v35i15.17561

Authors

Bei Li Northeastern University, China
Ziyang Wang Northeastern University
Hui Liu Northeastern University
Quan Du Northeastern University NiuTrans Research
Tong Xiao Northeastern University NiuTrans Research
Chunliang Zhang Northeastern University, China NiuTrans Research
Jingbo Zhu Northeastern University, China NiuTrans Research

DOI:

https://doi.org/10.1609/aaai.v35i15.17561

Keywords:

Machine Translation & Multilinguality

Abstract

Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8 times shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training, which achieves a BLEU score of 30.63 on English-German newstest2014. The code is publicly available at https://github.com/libeineu/GPKD.

Learning Light-Weight Translation Models from Deep Transformer

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription