LoKI: Low-Damage Knowledge Implanting of Large Language Models
DOI:
https://doi.org/10.1609/aaai.v40i39.40651Abstract
Fine-tuning adapts pretrained models for specific tasks but poses the risk of catastrophic forgetting (CF), where critical knowledge from pretraining is overwritten. To address the issue of CF in a general-purpose framework, we propose Low-damage Knowledge Implanting (LoKI), a parameter-efficient fine-tuning (PEFT) technique that utilizes recent mechanistic understanding of how knowledge is stored in transformer architectures. We compare LoKI against state-of-the-art PEFT methods in two real-world fine-tuning scenarios. The results show that LoKI demonstrates significantly better preservation of general capabilities. At the same time, its task-specific performance is comparable to or even surpasses that of full parameter fine-tuning and these PEFT methods across various model architectures. Our work bridges the mechanistic insights of LLMs' knowledge storage with practical fine-tuning objectives, enabling an effective balance between task-specific adaptation and the retention of general-purpose capabilities.Downloads
Published
2026-03-14
How to Cite
Wang, R., Ping, P., Guo, Z., Zhang, X., Shi, Q., Zhou, L., & Ji, T. (2026). LoKI: Low-Damage Knowledge Implanting of Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33620–33628. https://doi.org/10.1609/aaai.v40i39.40651
Issue
Section
AAAI Technical Track on Natural Language Processing IV