FIMAP: Feature Importance by Minimal Adversarial Perturbation

Matt Chapman-Rounds; Umang Bhatt; Erik Pazos; Marc-Andre Schulz; Konstantinos Georgatzis

doi:10.1609/aaai.v35i13.17362

Authors

Matt Chapman-Rounds University of Edinburgh
Umang Bhatt University of Cambridge
Erik Pazos QuantumBlack
Marc-Andre Schulz Department of Psychiatry and Psychotherapy, Charité–Universitätsmedizin Berlin,
Konstantinos Georgatzis QuantumBlack

DOI:

https://doi.org/10.1609/aaai.v35i13.17362

Keywords:

Accountability, Interpretability & Explainability

Abstract

Instance-based model-agnostic feature importance explanations (LIME, SHAP, L2X) are a popular form of algorithmic transparency. These methods generally return either a weighting or subset of input features as an explanation for the classification of an instance. An alternative literature argues instead that counterfactual instances, which alter the black-box model's classification, provide a more actionable form of explanation. We present Feature Importance by Minimal Adversarial Perturbation (FIMAP), a neural network based approach that unifies feature importance and counterfactual explanations. We show that this approach combines the two paradigms, recovering the output of feature-weighting methods in continuous feature spaces, whilst indicating the direction in which the nearest counterfactuals can be found. Our method also provides an implicit confidence estimate in its own explanations, something existing methods lack. Additionally, FIMAP improves upon the speed of sampling-based methods, such as LIME, by an order of magnitude, allowing for explanation deployment in time-critical applications. We extend our approach to categorical features using a partitioned Gumbel layer and demonstrate its efficacy on standard datasets.

FIMAP: Feature Importance by Minimal Adversarial Perturbation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription