A Unified Taylor Framework for Revisiting Attribution Methods

Authors

  • Huiqi Deng Sun Yat-Sen University
  • Na Zou Texas A&M University
  • Mengnan Du Texas A&M University
  • Weifu Chen Sun Yat-Sen University
  • Guocan Feng Sun Yat-Sen University
  • Xia Hu Texas A&M University

DOI:

https://doi.org/10.1609/aaai.v35i13.17365

Keywords:

Accountability, Interpretability & Explainability

Abstract

Attribution methods have been developed to understand the decision making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features. Existing attribution methods often built upon empirical intuitions and heuristics. There still lacks a general and theoretical framework that not only can unify these attribution methods, but also theoretically reveal their rationales, fidelity, and limitations. To bridge the gap, in this paper, we propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework. Based on reformulations, we analyze the attribution methods in terms of rationale, fidelity, and limitation. Moreover, We establish three principles for a good attribution in the Taylor attribution framework, i.e., low approximation error, correct contribution assignment, and unbiased baseline selection. Finally, we empirically validate the Taylor reformulations, and reveal a positive correlation between the attribution performance and the number of principles followed by the attribution method via benchmarking on real-world datasets.

Downloads

Published

2021-05-18

How to Cite

Deng, H., Zou, N., Du, M., Chen, W., Feng, G., & Hu, X. (2021). A Unified Taylor Framework for Revisiting Attribution Methods. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 11462-11469. https://doi.org/10.1609/aaai.v35i13.17365

Issue

Section

AAAI Technical Track on Philosophy and Ethics of AI