Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models’ Explanations (Student Abstract)

Mu-Tien Kuo; Chih-Chung Hsueh; Richard Tzong-Han Tsai

doi:10.1609/aaai.v38i21.30470

Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models’ Explanations (Student Abstract)

Authors

Mu-Tien Kuo Chingshin Academy Research Center for Humanities and Social Sciences, Academia Sinica
Chih-Chung Hsueh Chingshin Academy Research Center for Humanities and Social Sciences, Academia Sinica
Richard Tzong-Han Tsai Dept. of Computer Science and Engineering, National Central University, Taiwan Research Center for Humanities and Social Sciences, Academia Sinica

DOI:

https://doi.org/10.1609/aaai.v38i21.30470

Keywords:

Large Language Models, Explainable AI, Explainability, Fidelity, Faithfulness, Interpretability

Abstract

As Large Language Models (LLMs) become more prevalent in various fields, it is crucial to rigorously assess the quality of their explanations. Our research introduces a task-agnostic framework for evaluating free-text rationales, drawing on insights from both linguistics and machine learning. We evaluate two dimensions of explainability: fidelity and interpretability. For fidelity, we propose methods suitable for proprietary LLMs where direct introspection of internal features is unattainable. For interpretability, we use language models instead of human evaluators, addressing concerns about subjectivity and scalability in evaluations. We apply our framework to evaluate GPT-3.5 and the impact of prompts on the quality of its explanations. In conclusion, our framework streamlines the evaluation of explanations from LLMs, promoting the development of safer models.

AAAI-24 / IAAI-24 / EAAI-24 Proceedings Cover

Downloads

Published

2024-03-24

How to Cite

Kuo, M.-T., Hsueh, C.-C., & Tsai, R. T.-H. (2024). Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models’ Explanations (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23554–23555. https://doi.org/10.1609/aaai.v38i21.30470

Download Citation

Issue

Vol. 38 No. 21: IAAI-24, EAAI-24, AAAI-24 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Student Abstract and Poster Program

Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models’ Explanations (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information