Towards Automating Model Explanations with Certified Robustness Guarantees

Mengdi Huai; Jinduo Liu; Chenglin Miao; Liuyi Yao; Aidong Zhang

doi:10.1609/aaai.v36i6.20651

Authors

Mengdi Huai University of Virginia
Jinduo Liu Beijing University of Technology
Chenglin Miao University of Georgia
Liuyi Yao Alibaba Group
Aidong Zhang University of Virginia

DOI:

https://doi.org/10.1609/aaai.v36i6.20651

Keywords:

Machine Learning (ML)

Abstract

Providing model explanations has gained significant popularity recently. In contrast with the traditional feature-level model explanations, concept-based explanations can provide explanations in the form of high-level human concepts. However, existing concept-based explanation methods implicitly follow a two-step procedure that involves human intervention. Specifically, they first need the human to be involved to define (or extract) the high-level concepts, and then manually compute the importance scores of these identified concepts in a post-hoc way. This laborious process requires significant human effort and resource expenditure due to manual work, which hinders their large-scale deployability. In practice, it is challenging to automatically generate the concept-based explanations without human intervention due to the subjectivity of defining the units of concept-based interpretability. In addition, due to its data-driven nature, the interpretability itself is also potentially susceptible to malicious manipulations. Hence, our goal in this paper is to free human from this tedious process, while ensuring that the generated explanations are provably robust to adversarial perturbations. We propose a novel concept-based interpretation method, which can not only automatically provide the prototype-based concept explanations but also provide certified robustness guarantees for the generated prototype-based explanations. We also conduct extensive experiments on real-world datasets to verify the desirable properties of the proposed method.

Towards Automating Model Explanations with Certified Robustness Guarantees

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription