Inference Scaling Law for Retrieval Augmented Generation

Shu Zhou; Yuxuan Ao; Yunyang Xuan; Xin Wang; Tao Fan; Hao Wang

doi:10.1609/aaai.v40i19.38692

Authors

Shu Zhou School of Information Management, Nanjing University, China Key Laboratory of Data Engineering and Knowledge Services in Jiangsu Provincial Universities (Nanjing University), China Jiangsu International Joint Informatics Laboratory, Nanjing University, China
Yuxuan Ao School of Information Management, Nanjing University, China Key Laboratory of Data Engineering and Knowledge Services in Jiangsu Provincial Universities (Nanjing University), China Jiangsu International Joint Informatics Laboratory, Nanjing University, China
Yunyang Xuan School of Information Management, Nanjing University, China Key Laboratory of Data Engineering and Knowledge Services in Jiangsu Provincial Universities (Nanjing University), China Jiangsu International Joint Informatics Laboratory, Nanjing University, China
Xin Wang Baidu Inc, Beijing, China
Tao Fan School of Public Administration, Nanjing University of Finance & Economics, China
Hao Wang School of Information Management, Nanjing University, China Key Laboratory of Data Engineering and Knowledge Services in Jiangsu Provincial Universities (Nanjing University), China Jiangsu International Joint Informatics Laboratory, Nanjing University, China

DOI:

https://doi.org/10.1609/aaai.v40i19.38692

Abstract

Retrieval-augmented generation (RAG) has recently emerged as a powerful framework for knowledge-intensive natural language processing tasks, which leverages the strengths of both pre-trained language models and external knowledge. While significant progress has been made, the scaling behavior of these approaches during inference remains poorly understood. Towards this end, this paper presents a comprehensive study of inference scaling law for RAG models, which investigates how inference performance scales with respect to key factors including retriever model scale, generator model scale, number of retrieved documents, and context window size. Through extensive experiments on benchmark datasets, we establish empirical scaling laws that reveal power-law and sigmoid-type relationships between these factors and performance. We further build a joint inference scaling law with theoretical justification. With the proposed scaling laws, we can understand the performance tendency of RAG models under different computational resources. We believe our insights can pave the way for efficient and effective deployment of RAG models in more applications.

Inference Scaling Law for Retrieval Augmented Generation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information