Counterfactual Explanations for Misclassified Images: How Human and Machine Explanations Differ (Abstract Reprint)

Authors

  • Eoin Delaney School of Computer Science, University College Dublin, Belfield, Dublin, Ireland Insight Centre for Data Analytics, Belfield, Dublin, Ireland VistaMilk SFI Research Centre, Belfield, Dublin, Ireland
  • Arjun Pakrashi School of Computer Science, University College Dublin, Belfield, Dublin, Ireland VistaMilk SFI Research Centre, Belfield, Dublin, Ireland
  • Derek Greene School of Computer Science, University College Dublin, Belfield, Dublin, Ireland Insight Centre for Data Analytics, Belfield, Dublin, Ireland VistaMilk SFI Research Centre, Belfield, Dublin, Ireland
  • Mark T. Keane School of Computer Science, University College Dublin, Belfield, Dublin, Ireland Insight Centre for Data Analytics, Belfield, Dublin, Ireland VistaMilk SFI Research Centre, Belfield, Dublin, Ireland

DOI:

https://doi.org/10.1609/aaai.v38i20.30596

Keywords:

Journal Track

Abstract

Counterfactual explanations have emerged as a popular solution for the eXplainable AI (XAI) problem of elucidating the predictions of black-box deep-learning systems because people easily understand them, they apply across different problem domains and seem to be legally compliant. Although over 100 counterfactual methods exist in the XAI literature, each claiming to generate plausible explanations akin to those preferred by people, few of these methods have actually been tested on users (∼7%). Even fewer studies adopt a user-centered perspective; for instance, asking people for their counterfactual explanations to determine their perspective on a “good explanation”. This gap in the literature is addressed here using a novel methodology that (i) gathers human-generated counterfactual explanations for misclassified images, in two user studies and, then, (ii) compares these human-generated explanations to computationally-generated explanations for the same misclassifications. Results indicate that humans do not “minimally edit” images when generating counterfactual explanations. Instead, they make larger, “meaningful” edits that better approximate prototypes in the counterfactual class. An analysis based on “explanation goals” is proposed to account for this divergence between human and machine explanations. The implications of these proposals for future work are discussed.

Downloads

Published

2024-03-24

How to Cite

Delaney, E., Pakrashi, A., Greene, D., & Keane, M. T. (2024). Counterfactual Explanations for Misclassified Images: How Human and Machine Explanations Differ (Abstract Reprint). Proceedings of the AAAI Conference on Artificial Intelligence, 38(20), 22696-22696. https://doi.org/10.1609/aaai.v38i20.30596