CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

Asaf Yehudai; Lilach Eden; Yotam Perlitz; Roy Bar-Haim; Michal Shmueli-Scheuer

doi:10.1609/aaai.v40i48.42398

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

Authors

Asaf Yehudai Hebrew University of Jerusalem IBM Research
Lilach Eden IBM Research
Yotam Perlitz IBM Research
Roy Bar-Haim IBM Research
Michal Shmueli-Scheuer IBM Research

DOI:

https://doi.org/10.1609/aaai.v40i48.42398

Abstract

The evaluation of Large Language Models (LLMs) increasingly relies on other LLMs acting as judges. However, current evaluation paradigms typically yield a single score or ranking, answering which model is better but not why. While essential for benchmarking, these top-level scores obscure the specific, actionable reasons behind a model's performance. To bridge this gap, we introduce CLEAR, an interactive, open-source package for LLM-based error analysis. CLEAR first generates per-instance textual feedback, then it creates a set of system-level error issues, and quantifies the prevalence of each identified issue. Our package also provides users with an interactive dashboard that allows for a comprehensive error analysis through aggregate visualizations, applies interactive filters to isolate specific issues or score ranges, and drills down to the individual instances that exemplify a particular behavioral pattern. We demonstrate CLEAR analysis for RAG and Math benchmarks, and showcase its utility through a user case study.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

PDF
Poster

Published

2026-03-14

How to Cite

Yehudai, A., Eden, L., Perlitz, Y., Bar-Haim, R., & Shmueli-Scheuer, M. (2026). CLEAR: Error Analysis via LLM-as-a-Judge Made Easy. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41736–41738. https://doi.org/10.1609/aaai.v40i48.42398

Download Citation

Issue

Vol. 40 No. 48: EAAI-26 AI for Education, Model AI Assignments, AAAI-26 Emerging Trends, Doctoral Consortium, Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Demonstration Track

CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information