FIMAP: Feature Importance by Minimal Adversarial Perturbation
Keywords:Accountability, Interpretability & Explainability
AbstractInstance-based model-agnostic feature importance explanations (LIME, SHAP, L2X) are a popular form of algorithmic transparency. These methods generally return either a weighting or subset of input features as an explanation for the classification of an instance. An alternative literature argues instead that counterfactual instances, which alter the black-box model's classification, provide a more actionable form of explanation. We present Feature Importance by Minimal Adversarial Perturbation (FIMAP), a neural network based approach that unifies feature importance and counterfactual explanations. We show that this approach combines the two paradigms, recovering the output of feature-weighting methods in continuous feature spaces, whilst indicating the direction in which the nearest counterfactuals can be found. Our method also provides an implicit confidence estimate in its own explanations, something existing methods lack. Additionally, FIMAP improves upon the speed of sampling-based methods, such as LIME, by an order of magnitude, allowing for explanation deployment in time-critical applications. We extend our approach to categorical features using a partitioned Gumbel layer and demonstrate its efficacy on standard datasets.
How to Cite
Chapman-Rounds, M., Bhatt, U., Pazos, E., Schulz, M.-A., & Georgatzis, K. (2021). FIMAP: Feature Importance by Minimal Adversarial Perturbation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 11433-11441. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17362
AAAI Technical Track on Philosophy and Ethics of AI