CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines


  • Arjun Akula University of California, Los Angeles
  • Shuai Wang University of Illinois at Chicago
  • Song-Chun Zhu University of California, Los Angeles



We present CoCoX (short for Conceptual and Counterfactual Explanations), a model for explaining decisions made by a deep convolutional neural network (CNN). In Cognitive Psychology, the factors (or semantic-level features) that humans zoom in on when they imagine an alternative to a model prediction are often referred to as fault-lines. Motivated by this, our CoCoX model explains decisions made by a CNN using fault-lines. Specifically, given an input image I for which a CNN classification model M predicts class cpred, our fault-line based explanation identifies the minimal semantic-level features (e.g., stripes on zebra, pointed ears of dog), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class calt. We argue that, due to the conceptual and counterfactual nature of fault-lines, our CoCoX explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models. Extensive quantitative and qualitative experiments verify our hypotheses, showing that CoCoX significantly outperforms the state-of-the-art explainable AI models. Our implementation is available at




How to Cite

Akula, A., Wang, S., & Zhu, S.-C. (2020). CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines. Proceedings of the AAAI Conference on Artificial Intelligence, 34(03), 2594-2601.



AAAI Technical Track: Humans and AI