PRGB Benchmark: A Robust Placeholder-Assisted Algorithm for Benchmarking Retrieval-Augmented Generation

Authors

  • Zhehao Tan AntGroup
  • Yihan Jiao AntGroup
  • Dan Yang AntGroup
  • Junwei Liu Peking University
  • Lei Liu AntGroup
  • Jie Feng AntGroup
  • Duolin Sun AntGroup
  • Yue Shen AntGroup
  • Jian Wang AntGroup
  • Peng Wei AntGroup
  • Jinjie Gu AntGroup

DOI:

https://doi.org/10.1609/aaai.v40i39.40602

Abstract

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge, where the LLM's ability to generate responses based on the combination of a given query and retrieved documents is crucial. However, most benchmarks focus on overall RAG system performance, rarely assessing LLM-specific capabilities. Current benchmarks emphasize broad aspects such as noise robustness, but lack a systematic and granular evaluation framework on document utilization. To this end, we introduce Placeholder-RAG-Benchmark, a multi-level fine-grained benchmark, emphasizing the following progressive dimensions: (1) multi-level filtering abilities, (2) combination abilities, and (3) reference reasoning. To provide a more nuanced understanding of LLMs' roles in RAG systems, we formulate an innovative placeholder-based approach to decouple the contributions of the LLM's parametric knowledge and the external knowledge. Experiments demonstrate the limitations of representative LLMs in the RAG system's generation capabilities, particularly in error resilience and context faithfulness. Our benchmark provides a reproducible framework for developing more reliable and efficient RAG systems.

Downloads

Published

2026-03-14

How to Cite

Tan, Z., Jiao, Y., Yang, D., Liu, J., Liu, L., Feng, J., … Gu, J. (2026). PRGB Benchmark: A Robust Placeholder-Assisted Algorithm for Benchmarking Retrieval-Augmented Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33179–33186. https://doi.org/10.1609/aaai.v40i39.40602

Issue

Section

AAAI Technical Track on Natural Language Processing IV