Attention-via-Attention Neural Machine Translation
Keywords:translation, lexical similarity, character level
Since many languages originated from a common ancestral language and influence each other, there would inevitably exist similarities between these languages such as lexical similarity and named entity similarity. In this paper, we leverage these similarities to improve the translation performance in neural machine translation. Specifically, we introduce an attention-via-attention mechanism that allows the information of source-side characters flowing to the target side directly. With this mechanism, the target-side characters will be generated based on the representation of source-side characters when the words are similar. For instance, our proposed neural machine translation system learns to transfer the character-level information of the English word "system" through the attention-via-attention mechanism to generate the Czech word "systém." Consequently, our approach is able to not only achieve a competitive translation performance, but also reduce the model size significantly.