Making Sense of LLM Decisions: A Prototype-based Framework for Explainable Classification

Bowen Wei; Mehrdad Fazli; Ziwei Zhu

doi:10.1609/aaai.v40i32.39892

Making Sense of LLM Decisions: A Prototype-based Framework for Explainable Classification

Authors

Bowen Wei George Mason University
Mehrdad Fazli George Mason University
Ziwei Zhu George Mason University

DOI:

https://doi.org/10.1609/aaai.v40i32.39892

Abstract

Large language models have demonstrated impressive performance on natural language tasks, but their decision-making processes remain opaque. Existing explanation methods either suffer from limited faithfulness to the model's reasoning or produce explanations that are difficult for humans to understand. To address these challenges, we propose ProtoSurE, a novel prototype-based surrogate framework that provides faithful and understandable explanations for LLMs. ProtoSurE trains an interpretable-by-design surrogate model that aligns with the target LLM while utilizing sentence-level prototypes as understandable concepts. Extensive experiments show that ProtoSurE consistently outperforms state-of-the-art explanation methods across diverse LLMs and datasets. Importantly, ProtoSurE demonstrates strong data efficiency, requiring relatively few training examples to achieve good performance, making it practical for real-world applications.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

Published

2026-03-14

How to Cite

Wei, B., Fazli, M., & Zhu, Z. (2026). Making Sense of LLM Decisions: A Prototype-based Framework for Explainable Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 40(32), 26814–26822. https://doi.org/10.1609/aaai.v40i32.39892

Download Citation

Issue

Vol. 40 No. 32: AAAI-26 Technical Tracks 32

Section

AAAI Technical Track on Machine Learning IX

Making Sense of LLM Decisions: A Prototype-based Framework for Explainable Classification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information