Inference Scaling Law for Retrieval Augmented Generation
DOI:
https://doi.org/10.1609/aaai.v40i19.38692Abstract
Retrieval-augmented generation (RAG) has recently emerged as a powerful framework for knowledge-intensive natural language processing tasks, which leverages the strengths of both pre-trained language models and external knowledge. While significant progress has been made, the scaling behavior of these approaches during inference remains poorly understood. Towards this end, this paper presents a comprehensive study of inference scaling law for RAG models, which investigates how inference performance scales with respect to key factors including retriever model scale, generator model scale, number of retrieved documents, and context window size. Through extensive experiments on benchmark datasets, we establish empirical scaling laws that reveal power-law and sigmoid-type relationships between these factors and performance. We further build a joint inference scaling law with theoretical justification. With the proposed scaling laws, we can understand the performance tendency of RAG models under different computational resources. We believe our insights can pave the way for efficient and effective deployment of RAG models in more applications.Downloads
Published
2026-03-14
How to Cite
Zhou, S., Ao, Y., Xuan, Y., Wang, X., Fan, T., & Wang, H. (2026). Inference Scaling Law for Retrieval Augmented Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16522–16530. https://doi.org/10.1609/aaai.v40i19.38692
Issue
Section
AAAI Technical Track on Data Mining & Knowledge Management III