Explaining Model Confidence Using Counterfactuals

Authors

  • Thao Le The University of Melbourne
  • Tim Miller The University of Melbourne
  • Ronal Singh The University of Melbourne
  • Liz Sonenberg The University of Melbourne

DOI:

https://doi.org/10.1609/aaai.v37i10.26399

Keywords:

PEAI: Interpretability and Explainability, HAI: Human-Computer Interaction

Abstract

Displaying confidence scores in human-AI interaction has been shown to help build trust between humans and AI systems. However, most existing research uses only the confidence score as a form of communication. As confidence scores are just another model output, users may want to understand why the algorithm is confident to determine whether to accept the confidence score. In this paper, we show that counterfactual explanations of confidence scores help study participants to better understand and better trust a machine learning model's prediction. We present two methods for understanding model confidence using counterfactual explanation: (1) based on counterfactual examples; and (2) based on visualisation of the counterfactual space. Both increase understanding and trust for study participants over a baseline of no explanation, but qualitative results show that they are used quite differently, leading to recommendations of when to use each one and directions of designing better explanations.

Downloads

Published

2023-06-26

How to Cite

Le, T., Miller, T., Singh, R., & Sonenberg, L. (2023). Explaining Model Confidence Using Counterfactuals. Proceedings of the AAAI Conference on Artificial Intelligence, 37(10), 11856-11864. https://doi.org/10.1609/aaai.v37i10.26399

Issue

Section

AAAI Technical Track on Philosophy and Ethics of AI