On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

Authors

  • Kasun Amarasinghe Carnegie Mellon University, Pittsburgh, PA
  • Kit T. Rodolfa Stanford University, Palo Alto, CA
  • Sérgio Jesus Feedzai, Lisboa, Portugal
  • Valerie Chen Carnegie Mellon University, Pittsburgh, PA
  • Vladimir Balayan Feedzai, Lisboa, Portugal
  • Pedro Saleiro Feedzai, Lisboa, Portugal
  • Pedro Bizarro Feedzai, Lisboa, Portugal
  • Ameet Talwalkar Carnegie Mellon University, Pittsburgh, PA
  • Rayid Ghani Carnegie Mellon University, Pittsburgh, PA

DOI:

https://doi.org/10.1609/aaai.v38i19.30082

Keywords:

General

Abstract

Most existing evaluations of explainable machine learning (ML) methods rely on simplifying assumptions or proxies that do not reflect real-world use cases; the handful of more robust evaluations on real-world settings have shortcomings in their design, generally leading to overestimation of methods' real-world utility. In this work, we seek to address this by conducting a study that evaluates post-hoc explainable ML methods in a setting consistent with the application context and provide a template for future evaluation studies. We modify and improve a prior study on e-commerce fraud detection by relaxing the original work's simplifying assumptions that departed from the deployment context. Our study finds no evidence for the utility of the tested explainable ML methods in the context, which is a drastically different conclusion from the earlier work. This highlights how seemingly trivial experimental design choices can yield misleading conclusions about method utility. In addition, our work carries lessons about the necessity of not only evaluating explainable ML methods using tasks, data, users, and metrics grounded in the intended application context but also developing methods tailored to specific applications, moving beyond general-purpose explainable ML methods.

Published

2024-03-24

How to Cite

Amarasinghe, K., Rodolfa, K. T., Jesus, S., Chen, V., Balayan, V., Saleiro, P., Bizarro, P., Talwalkar, A., & Ghani, R. (2024). On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 20921-20929. https://doi.org/10.1609/aaai.v38i19.30082

Issue

Section

AAAI Technical Track on Safe, Robust and Responsible AI Track