CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines

Arjun Akula; Shuai Wang; Song-Chun Zhu

doi:10.1609/aaai.v34i03.5643

Authors

Arjun Akula University of California, Los Angeles
Shuai Wang University of Illinois at Chicago
Song-Chun Zhu University of California, Los Angeles

DOI:

https://doi.org/10.1609/aaai.v34i03.5643

Abstract

We present CoCoX (short for Conceptual and Counterfactual Explanations), a model for explaining decisions made by a deep convolutional neural network (CNN). In Cognitive Psychology, the factors (or semantic-level features) that humans zoom in on when they imagine an alternative to a model prediction are often referred to as fault-lines. Motivated by this, our CoCoX model explains decisions made by a CNN using fault-lines. Specifically, given an input image I for which a CNN classification model M predicts class c_pred, our fault-line based explanation identifies the minimal semantic-level features (e.g., stripes on zebra, pointed ears of dog), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class c_alt. We argue that, due to the conceptual and counterfactual nature of fault-lines, our CoCoX explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models. Extensive quantitative and qualitative experiments verify our hypotheses, showing that CoCoX significantly outperforms the state-of-the-art explainable AI models. Our implementation is available at https://github.com/arjunakula/CoCoX

CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription