SEAT: Stable and Explainable Attention

Lijie Hu; Yixin Liu; Ninghao Liu; Mengdi Huai; Lichao Sun; Di Wang

doi:10.1609/aaai.v37i11.26517

Authors

Lijie Hu King Abdullah University of Science and Technology
Yixin Liu Lehigh University
Ninghao Liu University of Georgia
Mengdi Huai Iowa Sate University
Lichao Sun Lehigh University
Di Wang King Abdullah University of Science and Technology Computational Bioscience Research Center SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence

DOI:

https://doi.org/10.1609/aaai.v37i11.26517

Keywords:

SNLP: Interpretability & Analysis of NLP Models, ML: Transparent, Interpretable, Explainable ML

Abstract

Attention mechanism has become a standard fixture in many state-of-the-art natural language processing (NLP) models, not only due to its outstanding performance, but also because it provides plausible innate explanations for neural architectures. However, recent studies show that attention is unstable against randomness and perturbations during training or testing, such as random seeds and slight perturbation of embeddings, which impedes it from being a faithful explanation tool. Thus, a natural question is whether we can find an alternative to vanilla attention, which is more stable and could keep the key characteristics of the explanation. In this paper, we provide a rigorous definition of such an attention method named SEAT (Stable and Explainable ATtention). Specifically, SEAT has the following three properties: (1) Its prediction distribution is close to the prediction of the vanilla attention; (2) Its top-k indices largely overlap with those of the vanilla attention; (3) It is robust w.r.t perturbations, i.e., any slight perturbation on SEAT will not change the attention and prediction distribution too much, which implicitly indicates that it is stable to randomness and perturbations. Furthermore, we propose an optimization method for obtaining SEAT, which could be considered as revising the vanilla attention. Finally, through intensive experiments on various datasets, we compare our SEAT with other baseline methods using RNN, BiLSTM and BERT architectures, with different evaluation metrics on model interpretation, stability and accuracy. Results show that, besides preserving the original explainability and model performance, SEAT is more stable against input perturbations and training randomness, which indicates it is a more faithful explanation.

SEAT: Stable and Explainable Attention

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription