Dual Transfer Learning for Neural Machine Translation with Marginal Distribution Regularization
DOI:
https://doi.org/10.1609/aaai.v32i1.11999Keywords:
transfer learning, semi-supervised neural machine translation, importance samplingAbstract
Neural machine translation (NMT) heavily relies on parallel
bilingual data for training. Since large-scale, high-quality
parallel corpora are usually costly to collect, it is appealing
to exploit monolingual corpora to improve NMT. Inspired by
the law of total probability, which connects the probability of
a given target-side monolingual sentence to the conditional
probability of translating from a source sentence to the target
one, we propose to explicitly exploit this connection to
learn from and regularize the training of NMT models using
monolingual data. The key technical challenge of this approach
is that there are exponentially many source sentences
for a target monolingual sentence while computing the sum
of the conditional probability given each possible source sentence.
We address this challenge by leveraging the dual translation
model (target-to-source translation) to sample several
mostly likely source-side sentences and avoid enumerating
all possible candidate source sentences. That is, we transfer
the knowledge contained in the dual model to boost the
training of the primal model (source-to-target translation),
and we call such an approach dual transfer learning. Experiment
results on English-French and German-English tasks
demonstrate that dual transfer learning achieves significant
improvement over several strong baselines and obtains new
state-of-the-art results.