Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation

Aris Hofmann; Inge Vejsbjerg; Dhaval Salwala; Elizabeth M. Daly

doi:10.1609/aaai.v40i48.42352

Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation

Authors

Aris Hofmann International Business Machines
Inge Vejsbjerg International Business Machines
Dhaval Salwala International Business Machines
Elizabeth M. Daly IBM Research

DOI:

https://doi.org/10.1609/aaai.v40i48.42352

Abstract

We present Auto-BenchmarkCard, a workflow for generating validated descriptions of AI benchmarks. Benchmark documentation is often incomplete or inconsistent, making it difficult to interpret and compare benchmarks across tasks or domains. Auto-BenchmarkCard addresses this gap by combining multi-agent data extraction from heterogeneous sources (e.g., Hugging Face, Unitxt, academic papers) with LLM-driven synthesis. A validation phase evaluates factual accuracy through atomic entailment scoring using the FactReasoner tool. This workflow has the potential to promote transparency, comparability, and reusability in AI benchmark reporting, enabling researchers and practitioners to better navigate and evaluate benchmark choices.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

Published

2026-03-14

How to Cite

Hofmann, A., Vejsbjerg, I., Salwala, D., & Daly, E. M. (2026). Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41598–41600. https://doi.org/10.1609/aaai.v40i48.42352

Download Citation

Issue

Vol. 40 No. 48: EAAI-26 AI for Education, Model AI Assignments, AAAI-26 Emerging Trends, Doctoral Consortium, Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Demonstration Track

Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information