RetroLM: Retrieval-Augmented KVs for Long-Context Processing

Authors

  • Kun Luo The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China Beijing Academy of Artificial Intelligence, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Zheng Liu Beijing Academy of Artificial Intelligence, Beijing, China Hong Kong Polytechnic University, Hong Kong, China
  • Shitao Xiao Beijing Academy of Artificial Intelligence, Beijing, China
  • Jiabei Chen The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China Beijing Academy of Artificial Intelligence, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Hongjin Qian Beijing Academy of Artificial Intelligence, Beijing, China Peking University, Beijing, China
  • Peitian Zhang Beijing Academy of Artificial Intelligence, Beijing, China
  • Shanshan Jiang Ricoh Software Research Center Beijing, Ricoh Company, Ltd.
  • Bin Dong Ricoh Software Research Center Beijing, Ricoh Company, Ltd.
  • Jun Zhao The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Kang Liu The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i38.40511

Abstract

Long-context processing remains a significant challenge for large language models (LLMs). Retrieval-augmented generation (RAG) has recently emerged as a promising approach, enabling LLMs to selectively access relevant information from extended contexts to improve efficiency. However, existing RAG approaches often lag behind other efficient long-context processing methods primarily due to inherent limitations on inaccurate retrieval and fragmented contexts. To address these limitations, we propose RetroLM, a novel RAG framework designed for effective long-context processing. Unlike traditional approaches, RetroLM introduces KV-level retrieval augmentation, which partitions the LLM's KV cache into contiguous pages and performs encoding and decoding operations based on the retrieved KV pages. Built upon this framework, we further develop a specialized retriever for precise retrieval of critical pages and conduct unsupervised post-training to optimize the model’s ability to leverage retrieved information. Compared with traditional RAG, the new approach enhances robustness to retrieval inaccuracy, facilitates effective utilization of fragmented contexts, and saves the cost from repeated context-encoding operations. We conduct extensive evaluations across several popular benchmarks, including LongBench, InfiniteBench, and RULER. RetroLM consistently outperforms existing long-LLMs and RAG-based methods, especially in tasks requiring deep reasoning or extreme context lengths.

Downloads

Published

2026-03-14

How to Cite

Luo, K., Liu, Z., Xiao, S., Chen, J., Qian, H., Zhang, P., Jiang, S., Dong, B., Zhao, J., & Liu, K. (2026). RetroLM: Retrieval-Augmented KVs for Long-Context Processing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 32365-32373. https://doi.org/10.1609/aaai.v40i38.40511

Issue

Section

AAAI Technical Track on Natural Language Processing III