Learning Robust Rationales for Model Explainability: A Guidance-Based Approach
DOI:
https://doi.org/10.1609/aaai.v38i16.29783Keywords:
NLP: Interpretability, Analysis, and Evaluation of NLP ModelsAbstract
Selective rationalization can be regarded as a straightforward self-explaining approach for enhancing model explainability in natural language processing tasks. It aims to provide explanations that are more accessible and understandable to non-technical users by first selecting subsets of input texts as rationales and then predicting based on chosen subsets. However, existing methods that follow this select-then-predict framework may suffer from the rationalization degeneration problem, resulting in sub-optimal or unsatisfactory rationales that do not align with human judgments. This problem may further lead to rationalization failure, resulting in meaningless rationales that ultimately undermine people's trust in the rationalization model. To address these challenges, we propose a Guidance-based Rationalization method (G-RAT) that effectively improves robustness against failure situations and the quality of rationales by using a guidance module to regularize selections and distributions. Experimental results on two synthetic settings prove that our method is robust to the rationalization degeneration and failure problems, while the results on two real datasets show its effectiveness in providing rationales in line with human judgments. The source code is available at https://github.com/shuaibo919/g-rat.Downloads
Published
2024-03-24
How to Cite
Hu, S., & Yu, K. (2024). Learning Robust Rationales for Model Explainability: A Guidance-Based Approach. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18243-18251. https://doi.org/10.1609/aaai.v38i16.29783
Issue
Section
AAAI Technical Track on Natural Language Processing I