FIMAP: Feature Importance by Minimal Adversarial Perturbation


  • Matt Chapman-Rounds University of Edinburgh
  • Umang Bhatt University of Cambridge
  • Erik Pazos QuantumBlack
  • Marc-Andre Schulz Department of Psychiatry and Psychotherapy, Charité–Universitätsmedizin Berlin,
  • Konstantinos Georgatzis QuantumBlack


Accountability, Interpretability & Explainability


Instance-based model-agnostic feature importance explanations (LIME, SHAP, L2X) are a popular form of algorithmic transparency. These methods generally return either a weighting or subset of input features as an explanation for the classification of an instance. An alternative literature argues instead that counterfactual instances, which alter the black-box model's classification, provide a more actionable form of explanation. We present Feature Importance by Minimal Adversarial Perturbation (FIMAP), a neural network based approach that unifies feature importance and counterfactual explanations. We show that this approach combines the two paradigms, recovering the output of feature-weighting methods in continuous feature spaces, whilst indicating the direction in which the nearest counterfactuals can be found. Our method also provides an implicit confidence estimate in its own explanations, something existing methods lack. Additionally, FIMAP improves upon the speed of sampling-based methods, such as LIME, by an order of magnitude, allowing for explanation deployment in time-critical applications. We extend our approach to categorical features using a partitioned Gumbel layer and demonstrate its efficacy on standard datasets.




How to Cite

Chapman-Rounds, M., Bhatt, U., Pazos, E., Schulz, M.-A., & Georgatzis, K. (2021). FIMAP: Feature Importance by Minimal Adversarial Perturbation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 11433-11441. Retrieved from



AAAI Technical Track on Philosophy and Ethics of AI