Efficient Context Retention in LLMs: Enhancing In-Context Memorization as an Alternative

Bansari Patel; Edward Kim

doi:10.1609/aaaiss.v7i1.36933

Authors

Bansari Patel Drexel University
Edward Kim Drexel University

DOI:

https://doi.org/10.1609/aaaiss.v7i1.36933

Abstract

Large Language Models (LLMs) are widely utilized for tasks requiring contextual understanding; however, their reliance on large context windows introduces significant computational overhead due to the transformer's quadratic complexity. This inefficiency is a critical barrier to their deployment in resource-constrained settings like rural healthcare, where processing longitudinal patient data from Electronic Health Records (EHRs) is essential. To achieve this, our research investigates an alternative paradigm: training lightweight, specialized models for complete knowledge internalization, enabling them to function as persistent and efficient knowledge bases on local hardware. Our methodology involves training a 12-layer, 124-million-parameter nanoGPT model de novo on specialized subsets of the MMLU benchmark, including domains relevant to healthcare. The training objective was explicitly data internalization, not generalization. The entire domain-specific corpus, consisting of over 250,000 tokens formatted for a question-and-answer recall task, was used for training until the model achieved near-zero training loss. Performance was then evaluated on the model's ability to perfectly reproduce answers from a "seen" validation set, with recall certainty quantified via softmax probabilities. The resulting models successfully internalized their respective knowledge domains, achieving near-100% accuracy on recall tasks with high confidence scores. This outcome validates that targeted training for memorization can produce reliable and computationally efficient expert agents. For rural health, this approach offers a practical alternative to large context windows, enabling the deployment of a fleet of specialized models on local hardware for tasks like patient history recall or clinical guideline retrieval. This drastically reduces computational costs and latency, providing a scalable solution without requiring continuous, high-bandwidth cloud access.

Efficient Context Retention in LLMs: Enhancing In-Context Memorization as an Alternative

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information