EvalAssist: LLM-as-a-Judge Simplified

Michael Desmond; Zahra Ashktorab; Werner Geyer; Elizabeth M. Daly; Martín Santillán Cooper; Qian Pan; Rahul Nair; Nico Wagner; Tejaswini Pedapati

doi:10.1609/aaai.v39i28.35351

EvalAssist: LLM-as-a-Judge Simplified

Authors

Michael Desmond IBM Research
Zahra Ashktorab IBM Research
Werner Geyer IBM Research
Elizabeth M. Daly IBM Research
Martín Santillán Cooper IBM Research
Qian Pan IBM Research
Rahul Nair IBM Research
Nico Wagner IBM Research
Tejaswini Pedapati IBM Research

DOI:

https://doi.org/10.1609/aaai.v39i28.35351

Abstract

We present EvalAssist, a framework that simplifies the LLM- as-a-judge workflow. The system provides an online criteria development environment, where users can interactively build, test, and share custom evaluation criteria in a structured and portable format. A library of LLM based evaluators is made available that incorporates various algorithmic innovations such as token-probability based judgement, positional bias checking, and certainty estimation that help to engender trust in the evaluation process. We have computed extensive benchmarks and also deployed the system internally in our organization with several hundreds of users.

AAAI-25 / IAAI-25 / EAAI-25 Proceedings Cover

Downloads

Published

2025-04-11

How to Cite

Desmond, M., Ashktorab, Z., Geyer, W., Daly, E. M., Santillán Cooper, M., Pan, Q., … Pedapati, T. (2025). EvalAssist: LLM-as-a-Judge Simplified. Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29637–29639. https://doi.org/10.1609/aaai.v39i28.35351

Download Citation

Issue

Vol. 39 No. 28: IAAI-25, EAAI-25, AAAI-25 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Demonstration Track

EvalAssist: LLM-as-a-Judge Simplified

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information