SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention

Authors

  • Bohan Yu School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences, Beijing, China MEG, Baidu Inc., Beijing, China The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, CAS, Beijing, China
  • Wei Huang MEG, Baidu Inc., Beijing, China
  • Kang Liu The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, CAS, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i41.40747

Abstract

This paper proposes SR-KI, a novel approach for integrating real-time and large-scale structured knowledge bases (KBs) into large language models (LLMs). SR-KI begins by encoding KBs into key-value pairs using a pretrained encoder, and injects them into LLMs' KV cache. Building on this representation, we employ a two-stage training paradigm: first locating a dedicated retrieval layer within the LLM, and then applying an attention-based loss at this layer to explicitly supervise attention toward relevant KB entries. Unlike traditional retrieval-augmented generation methods that rely heavily on the performance of external retrievers and multi-stage pipelines, SR-KI supports end-to-end inference by performing retrieval entirely within the model’s latent space. This design enables efficient compression of injected knowledge and facilitates dynamic knowledge updates. Comprehensive experiments demonstrate that SR-KI enables the integration of up to 40K KBs into a 7B LLM on a single A100 40GB GPU, and achieves strong retrieval performance—maintaining over 98% Recall@10 on the best-performing task and exceeding 88% on average across all tasks. Task performance on question answering and KB ID generation also demonstrates that SR-KI maintains strong performance while achieving up to 99.75% compression of the injected KBs.

Published

2026-03-14

How to Cite

Yu, B., Huang, W., & Liu, K. (2026). SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34486–34494. https://doi.org/10.1609/aaai.v40i41.40747

Issue

Section

AAAI Technical Track on Natural Language Processing VI