RetroLM: Retrieval-Augmented KVs for Long-Context Processing

Kun Luo; Zheng Liu; Shitao Xiao; Jiabei Chen; Hongjin Qian; Peitian Zhang; Shanshan Jiang; Bin Dong; Jun Zhao; Kang Liu

doi:10.1609/aaai.v40i38.40511

Authors

Kun Luo The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China Beijing Academy of Artificial Intelligence, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Zheng Liu Beijing Academy of Artificial Intelligence, Beijing, China Hong Kong Polytechnic University, Hong Kong, China
Shitao Xiao Beijing Academy of Artificial Intelligence, Beijing, China
Jiabei Chen The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China Beijing Academy of Artificial Intelligence, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Hongjin Qian Beijing Academy of Artificial Intelligence, Beijing, China Peking University, Beijing, China
Peitian Zhang Beijing Academy of Artificial Intelligence, Beijing, China
Shanshan Jiang Ricoh Software Research Center Beijing, Ricoh Company, Ltd.
Bin Dong Ricoh Software Research Center Beijing, Ricoh Company, Ltd.
Jun Zhao The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Kang Liu The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i38.40511

Abstract

Long-context processing remains a significant challenge for large language models (LLMs). Retrieval-augmented generation (RAG) has recently emerged as a promising approach, enabling LLMs to selectively access relevant information from extended contexts to improve efficiency. However, existing RAG approaches often lag behind other efficient long-context processing methods primarily due to inherent limitations on inaccurate retrieval and fragmented contexts. To address these limitations, we propose RetroLM, a novel RAG framework designed for effective long-context processing. Unlike traditional approaches, RetroLM introduces KV-level retrieval augmentation, which partitions the LLM's KV cache into contiguous pages and performs encoding and decoding operations based on the retrieved KV pages. Built upon this framework, we further develop a specialized retriever for precise retrieval of critical pages and conduct unsupervised post-training to optimize the model’s ability to leverage retrieved information. Compared with traditional RAG, the new approach enhances robustness to retrieval inaccuracy, facilitates effective utilization of fragmented contexts, and saves the cost from repeated context-encoding operations. We conduct extensive evaluations across several popular benchmarks, including LongBench, InfiniteBench, and RULER. RetroLM consistently outperforms existing long-LLMs and RAG-based methods, especially in tasks requiring deep reasoning or extreme context lengths.

RetroLM: Retrieval-Augmented KVs for Long-Context Processing

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information