PRGB Benchmark: A Robust Placeholder-Assisted Algorithm for Benchmarking Retrieval-Augmented Generation

Zhehao Tan; Yihan Jiao; Dan Yang; Junwei Liu; Lei Liu; Jie Feng; Duolin Sun; Yue Shen; Jian Wang; Peng Wei; Jinjie Gu

doi:10.1609/aaai.v40i39.40602

Authors

Zhehao Tan AntGroup
Yihan Jiao AntGroup
Dan Yang AntGroup
Junwei Liu Peking University
Lei Liu AntGroup
Jie Feng AntGroup
Duolin Sun AntGroup
Yue Shen AntGroup
Jian Wang AntGroup
Peng Wei AntGroup
Jinjie Gu AntGroup

DOI:

https://doi.org/10.1609/aaai.v40i39.40602

Abstract

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge, where the LLM's ability to generate responses based on the combination of a given query and retrieved documents is crucial. However, most benchmarks focus on overall RAG system performance, rarely assessing LLM-specific capabilities. Current benchmarks emphasize broad aspects such as noise robustness, but lack a systematic and granular evaluation framework on document utilization. To this end, we introduce Placeholder-RAG-Benchmark, a multi-level fine-grained benchmark, emphasizing the following progressive dimensions: (1) multi-level filtering abilities, (2) combination abilities, and (3) reference reasoning. To provide a more nuanced understanding of LLMs' roles in RAG systems, we formulate an innovative placeholder-based approach to decouple the contributions of the LLM's parametric knowledge and the external knowledge. Experiments demonstrate the limitations of representative LLMs in the RAG system's generation capabilities, particularly in error resilience and context faithfulness. Our benchmark provides a reproducible framework for developing more reliable and efficient RAG systems.

PRGB Benchmark: A Robust Placeholder-Assisted Algorithm for Benchmarking Retrieval-Augmented Generation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information