Dual Transfer Learning for Neural Machine Translation with Marginal Distribution Regularization

Authors

  • Yijun Wang University of Science and Technology of China
  • Yingce Xia University of Science and Technology of China
  • Li Zhao Microsoft Research Asia
  • Jiang Bian Microsoft Research Asia
  • Tao Qin Microsoft Research Asia
  • Guiquan Liu University of Science and Technology of China
  • Tie-Yan Liu Microsoft Research Asia

DOI:

https://doi.org/10.1609/aaai.v32i1.11999

Keywords:

transfer learning, semi-supervised neural machine translation, importance sampling

Abstract

Neural machine translation (NMT) heavily relies on parallel

bilingual data for training. Since large-scale, high-quality

parallel corpora are usually costly to collect, it is appealing

to exploit monolingual corpora to improve NMT. Inspired by

the law of total probability, which connects the probability of

a given target-side monolingual sentence to the conditional

probability of translating from a source sentence to the target

one, we propose to explicitly exploit this connection to

learn from and regularize the training of NMT models using

monolingual data. The key technical challenge of this approach

is that there are exponentially many source sentences

for a target monolingual sentence while computing the sum

of the conditional probability given each possible source sentence.

We address this challenge by leveraging the dual translation

model (target-to-source translation) to sample several

mostly likely source-side sentences and avoid enumerating

all possible candidate source sentences. That is, we transfer

the knowledge contained in the dual model to boost the

training of the primal model (source-to-target translation),

and we call such an approach dual transfer learning. Experiment

results on English-French and German-English tasks

demonstrate that dual transfer learning achieves significant

improvement over several strong baselines and obtains new

state-of-the-art results.

Downloads

Published

2018-04-27

How to Cite

Wang, Y., Xia, Y., Zhao, L., Bian, J., Qin, T., Liu, G., & Liu, T.-Y. (2018). Dual Transfer Learning for Neural Machine Translation with Marginal Distribution Regularization. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11999