SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention

Bohan Yu; Wei Huang; Kang Liu

doi:10.1609/aaai.v40i41.40747

Authors

Bohan Yu School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences, Beijing, China MEG, Baidu Inc., Beijing, China The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, CAS, Beijing, China
Wei Huang MEG, Baidu Inc., Beijing, China
Kang Liu The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, CAS, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i41.40747

Abstract

This paper proposes SR-KI, a novel approach for integrating real-time and large-scale structured knowledge bases (KBs) into large language models (LLMs). SR-KI begins by encoding KBs into key-value pairs using a pretrained encoder, and injects them into LLMs' KV cache. Building on this representation, we employ a two-stage training paradigm: first locating a dedicated retrieval layer within the LLM, and then applying an attention-based loss at this layer to explicitly supervise attention toward relevant KB entries. Unlike traditional retrieval-augmented generation methods that rely heavily on the performance of external retrievers and multi-stage pipelines, SR-KI supports end-to-end inference by performing retrieval entirely within the model’s latent space. This design enables efficient compression of injected knowledge and facilitates dynamic knowledge updates. Comprehensive experiments demonstrate that SR-KI enables the integration of up to 40K KBs into a 7B LLM on a single A100 40GB GPU, and achieves strong retrieval performance—maintaining over 98% Recall@10 on the best-performing task and exceeding 88% on average across all tasks. Task performance on question answering and KB ID generation also demonstrates that SR-KI maintains strong performance while achieving up to 99.75% compression of the injected KBs.

SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information