Improving Context-Aware Neural Machine Translation Using Self-Attentive Sentence Embedding
Fully Attentional Networks (FAN) like Transformer (Vaswani et al. 2017) has shown superior results in Neural Machine Translation (NMT) tasks and has become a solid baseline for translation tasks. More recent studies also have reported experimental results that additional contextual sentences improve translation qualities of NMT models (Voita et al. 2018; Müller et al. 2018; Zhang et al. 2018). However, those studies have exploited multiple context sentences as a single long concatenated sentence, that may cause the models to suffer from inefficient computational complexities and long-range dependencies. In this paper, we propose Hierarchical Context Encoder (HCE) that is able to exploit multiple context sentences separately using the hierarchical FAN structure. Our proposed encoder first abstracts sentence-level information from preceding sentences in a self-attentive way, and then hierarchically encodes context-level information. Through extensive experiments, we observe that our HCE records the best performance measured in BLEU score on English-German, English-Turkish, and English-Korean corpus. In addition, we observe that our HCE records the best performance in a crowd-sourced test set which is designed to evaluate how well an encoder can exploit contextual information. Finally, evaluation on English-Korean pronoun resolution test suite also shows that our HCE can properly exploit contextual information.