TY - JOUR AU - Sun, Zewei AU - Huang, Shujian AU - Wei, Hao-Ran AU - Dai, Xin-yu AU - Chen, Jiajun PY - 2020/04/03 Y2 - 2024/03/28 TI - Generating Diverse Translation by Manipulating Multi-Head Attention JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 34 IS - 05 SE - AAAI Technical Track: Natural Language Processing DO - 10.1609/aaai.v34i05.6429 UR - https://ojs.aaai.org/index.php/AAAI/article/view/6429 SP - 8976-8983 AB - <p>Transformer model (Vaswani et al. 2017) has been widely used in machine translation tasks and obtained state-of-the-art results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discovery and propose a method to generate diverse translations by manipulating heads. Furthermore, we make use of these diverse translations with the back-translation technique for better data augmentation. Experiment results show that our method generates diverse translations without a severe drop in translation quality. Experiments also show that back-translation with these diverse translations could bring a significant improvement in performance on translation tasks. An auxiliary experiment of conversation response generation task proves the effect of diversity as well.</p> ER -