RetroLM: Retrieval-Augmented KVs for Long-Context Processing
DOI:
https://doi.org/10.1609/aaai.v40i38.40511Abstract
Long-context processing remains a significant challenge for large language models (LLMs). Retrieval-augmented generation (RAG) has recently emerged as a promising approach, enabling LLMs to selectively access relevant information from extended contexts to improve efficiency. However, existing RAG approaches often lag behind other efficient long-context processing methods primarily due to inherent limitations on inaccurate retrieval and fragmented contexts. To address these limitations, we propose RetroLM, a novel RAG framework designed for effective long-context processing. Unlike traditional approaches, RetroLM introduces KV-level retrieval augmentation, which partitions the LLM's KV cache into contiguous pages and performs encoding and decoding operations based on the retrieved KV pages. Built upon this framework, we further develop a specialized retriever for precise retrieval of critical pages and conduct unsupervised post-training to optimize the model’s ability to leverage retrieved information. Compared with traditional RAG, the new approach enhances robustness to retrieval inaccuracy, facilitates effective utilization of fragmented contexts, and saves the cost from repeated context-encoding operations. We conduct extensive evaluations across several popular benchmarks, including LongBench, InfiniteBench, and RULER. RetroLM consistently outperforms existing long-LLMs and RAG-based methods, especially in tasks requiring deep reasoning or extreme context lengths.Downloads
Published
2026-03-14
How to Cite
Luo, K., Liu, Z., Xiao, S., Chen, J., Qian, H., Zhang, P., Jiang, S., Dong, B., Zhao, J., & Liu, K. (2026). RetroLM: Retrieval-Augmented KVs for Long-Context Processing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 32365-32373. https://doi.org/10.1609/aaai.v40i38.40511
Issue
Section
AAAI Technical Track on Natural Language Processing III