Word Attention for Sequence to Sequence Text Understanding

Lijun Wu; Fei Tian; Li Zhao; Jianhuang Lai; Tie-Yan Liu

doi:10.1609/aaai.v32i1.11971

Authors

Lijun Wu Sun Yat-sen University
Fei Tian Microsoft Research
Li Zhao Microsoft Research
Jianhuang Lai Sun Yat-sen University
Tie-Yan Liu Microsoft Research

DOI:

https://doi.org/10.1609/aaai.v32i1.11971

Abstract

Attention mechanism has been a key component in Recurrent Neural Networks (RNNs) based sequence to sequence learning framework, which has been adopted in many text understanding tasks, such as neural machine translation and abstractive summarization. In these tasks, the attention mechanism models how important each part of the source sentence is to generate a target side word. To compute such importance scores, the attention mechanism summarizes the source side information in the encoder RNN hidden states (i.e., h_t), and then builds a context vector for a target side word upon a subsequence representation of the source sentence, since h_t actually summarizes the information of the subsequence containing the first t-th words in the source sentence. We in this paper, show that an additional attention mechanism called word attention, that builds itself upon word level representations, significantly enhances the performance of sequence to sequence learning. Our word attention can enrich the source side contextual representation by directly promoting the clean word level information in each step. Furthermore, we propose to use contextual gates to dynamically combine the subsequence level and word level contextual information. Experimental results on abstractive summarization and neural machine translation show that word attention significantly improve over strong baselines.

Word Attention for Sequence to Sequence Text Understanding

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information