Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments

Authors

  • Marharyta Domnich Institute of Computer Science, University of Tartu, Tartu, Estonia
  • Julius Välja Institute of Computer Science, University of Tartu, Tartu, Estonia
  • Rasmus Moorits Veski Institute of Computer Science, University of Tartu, Tartu, Estonia École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
  • Giacomo Magnifico Institute of Computer Science, University of Tartu, Tartu, Estonia
  • Kadi Tulver Institute of Computer Science, University of Tartu, Tartu, Estonia
  • Eduard Barbu Institute of Computer Science, University of Tartu, Tartu, Estonia
  • Raul Vicente Institute of Computer Science, University of Tartu, Tartu, Estonia

DOI:

https://doi.org/10.1609/aaai.v39i15.33791

Abstract

As machine learning models evolve, maintaining transparency demands more human-centric explainable AI techniques. Counterfactual explanations, with roots in human reasoning, identify the minimal input changes needed to obtain a given output and, hence, are crucial for supporting decision-making. Despite their importance, the evaluation of these explanations often lacks grounding in user studies and remains fragmented, with existing metrics not fully capturing human perspectives. To address this challenge, we developed a diverse set of 30 counterfactual scenarios and collected ratings across 8 evaluation metrics from 206 respondents. Subsequently, we fine-tuned different Large Language Models (LLMs) to predict average or individual human judgment across these metrics. Our methodology allowed LLMs to achieve an accuracy of up to 63% in zero-shot evaluations and 85% (over a 3-classes prediction) with fine-tuning across all metrics. The fine-tuned models predicting human ratings offer better comparability and scalability in evaluating different counterfactual explanation frameworks.

Downloads

Published

2025-04-11

How to Cite

Domnich, M., Välja, J., Veski, R. M., Magnifico, G., Tulver, K., Barbu, E., & Vicente, R. (2025). Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments. Proceedings of the AAAI Conference on Artificial Intelligence, 39(15), 16308–16316. https://doi.org/10.1609/aaai.v39i15.33791

Issue

Section

AAAI Technical Track on Machine Learning I