SEAT: Stable and Explainable Attention


  • Lijie Hu King Abdullah University of Science and Technology
  • Yixin Liu Lehigh University
  • Ninghao Liu University of Georgia
  • Mengdi Huai Iowa Sate University
  • Lichao Sun Lehigh University
  • Di Wang King Abdullah University of Science and Technology Computational Bioscience Research Center SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence



SNLP: Interpretability & Analysis of NLP Models, ML: Transparent, Interpretable, Explainable ML


Attention mechanism has become a standard fixture in many state-of-the-art natural language processing (NLP) models, not only due to its outstanding performance, but also because it provides plausible innate explanations for neural architectures. However, recent studies show that attention is unstable against randomness and perturbations during training or testing, such as random seeds and slight perturbation of embeddings, which impedes it from being a faithful explanation tool. Thus, a natural question is whether we can find an alternative to vanilla attention, which is more stable and could keep the key characteristics of the explanation. In this paper, we provide a rigorous definition of such an attention method named SEAT (Stable and Explainable ATtention). Specifically, SEAT has the following three properties: (1) Its prediction distribution is close to the prediction of the vanilla attention; (2) Its top-k indices largely overlap with those of the vanilla attention; (3) It is robust w.r.t perturbations, i.e., any slight perturbation on SEAT will not change the attention and prediction distribution too much, which implicitly indicates that it is stable to randomness and perturbations. Furthermore, we propose an optimization method for obtaining SEAT, which could be considered as revising the vanilla attention. Finally, through intensive experiments on various datasets, we compare our SEAT with other baseline methods using RNN, BiLSTM and BERT architectures, with different evaluation metrics on model interpretation, stability and accuracy. Results show that, besides preserving the original explainability and model performance, SEAT is more stable against input perturbations and training randomness, which indicates it is a more faithful explanation.




How to Cite

Hu, L., Liu, Y., Liu, N., Huai, M., Sun, L., & Wang, D. (2023). SEAT: Stable and Explainable Attention. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 12907-12915.



AAAI Technical Track on Speech & Natural Language Processing