AIDE: Antithetical, Intent-based, and Diverse Example-Based Explanations
DOI:
https://doi.org/10.1609/aies.v7i1.31702Abstract
For many use-cases, it is often important to explain the prediction of a black-box model by identifying the most influential training data samples. Existing approaches lack customization for user intent and often provide a homogeneous set of explanation samples, failing to reveal the model's reasoning from different angles. In this paper, we propose AIDE, an approach for providing antithetical (i.e., contrastive), intent-based, diverse explanations for opaque and complex models. AIDE distinguishes three types of explainability intents: interpreting a correct, investigating a wrong, and clarifying an ambiguous prediction. For each intent, AIDE selects an appropriate set of influential training samples that support or oppose the prediction either directly or by contrast. To provide a succinct summary, AIDE uses diversity-aware sampling to avoid redundancy and increase coverage of the training data. We demonstrate the effectiveness of AIDE on image and text classification tasks, in three ways: quantitatively, assessing correctness and continuity; qualitatively, comparing anecdotal evidence from AIDE and other example-based approaches; and via a user study, evaluating multiple aspects of AIDE. The results show that AIDE addresses the limitations of existing methods and exhibits desirable traits for an explainability method.Downloads
Published
2024-10-16
How to Cite
Nematov, I., Sacharidis, D., Hose, K., & Sagi, T. (2024). AIDE: Antithetical, Intent-based, and Diverse Example-Based Explanations. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7(1), 1051-1062. https://doi.org/10.1609/aies.v7i1.31702
Issue
Section
Full Archival Papers