Inference Scaling Law for Retrieval Augmented Generation

Authors

  • Shu Zhou School of Information Management, Nanjing University, China Key Laboratory of Data Engineering and Knowledge Services in Jiangsu Provincial Universities (Nanjing University), China Jiangsu International Joint Informatics Laboratory, Nanjing University, China
  • Yuxuan Ao School of Information Management, Nanjing University, China Key Laboratory of Data Engineering and Knowledge Services in Jiangsu Provincial Universities (Nanjing University), China Jiangsu International Joint Informatics Laboratory, Nanjing University, China
  • Yunyang Xuan School of Information Management, Nanjing University, China Key Laboratory of Data Engineering and Knowledge Services in Jiangsu Provincial Universities (Nanjing University), China Jiangsu International Joint Informatics Laboratory, Nanjing University, China
  • Xin Wang Baidu Inc, Beijing, China
  • Tao Fan School of Public Administration, Nanjing University of Finance & Economics, China
  • Hao Wang School of Information Management, Nanjing University, China Key Laboratory of Data Engineering and Knowledge Services in Jiangsu Provincial Universities (Nanjing University), China Jiangsu International Joint Informatics Laboratory, Nanjing University, China

DOI:

https://doi.org/10.1609/aaai.v40i19.38692

Abstract

Retrieval-augmented generation (RAG) has recently emerged as a powerful framework for knowledge-intensive natural language processing tasks, which leverages the strengths of both pre-trained language models and external knowledge. While significant progress has been made, the scaling behavior of these approaches during inference remains poorly understood. Towards this end, this paper presents a comprehensive study of inference scaling law for RAG models, which investigates how inference performance scales with respect to key factors including retriever model scale, generator model scale, number of retrieved documents, and context window size. Through extensive experiments on benchmark datasets, we establish empirical scaling laws that reveal power-law and sigmoid-type relationships between these factors and performance. We further build a joint inference scaling law with theoretical justification. With the proposed scaling laws, we can understand the performance tendency of RAG models under different computational resources. We believe our insights can pave the way for efficient and effective deployment of RAG models in more applications.

Published

2026-03-14

How to Cite

Zhou, S., Ao, Y., Xuan, Y., Wang, X., Fan, T., & Wang, H. (2026). Inference Scaling Law for Retrieval Augmented Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16522–16530. https://doi.org/10.1609/aaai.v40i19.38692

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management III