In-Situ Eval: A Modular Framework for Custom and Real-Time RAG Benchmarking

Ritvik Garimella; Kaushik Roy; Chathurangi Shyalika; Amit Sheth

doi:10.1609/aaai.v40i48.42348

Authors

Ritvik Garimella University of South Carolina, Columbia, SC
Kaushik Roy University of Alabama, Tuscaloosa, AL
Chathurangi Shyalika University of South Carolina, Columbia, SC
Amit Sheth University of South Carolina, Columbia, SC

DOI:

https://doi.org/10.1609/aaai.v40i48.42348

Abstract

Retrieval-Augmented Generation (RAG) has become the standard approach for integrating domain knowledge into Large Language Models (LLMs). However, fair comparison of RAG pipelines remains difficult: data preparation is often ad hoc, subsampling methods are opaque, parameters vary across implementations, and evaluation is fragmented. We present In-Situ Eval, a unified and reproducible framework that operationalizes the full RAG pipeline with configurable subsampling strategies and both RAG-specific and generic evaluation metrics. The platform supports two execution modes: an offline Dataset mode for evaluating precomputed outputs, and a live Retrieval mode for benchmarking RAG variants with state-of-the-art LLMs. Users can flexibly select datasets, retrieval techniques, models, and metrics, enabling side-by-side comparisons, ablations, and targeted analyses. This holistic approach reduces computational costs, clarifies the impact of subsampling techniques, and provides actionable insights for real-world deployments. By facilitating transparent, customizable, and interactive benchmarking, In-Situ Eval empowers both researchers and practitioners to make informed decisions in adapting RAG pipelines to domain-specific needs.

In-Situ Eval: A Modular Framework for Custom and Real-Time RAG Benchmarking

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information