Editing Memories Through Few Targeted Neurons

Authors

  • Wei Zhou Cognitive Computing and Intelligent Information Processing (CCIIP) Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, China Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL), China
  • Wei Wei Cognitive Computing and Intelligent Information Processing (CCIIP) Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, China Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL), China
  • Guibang Cao Ping An Property & Casualty Insurance Company of China, Ltd
  • Fei Wang Institute of Computing Technology, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v39i24.34807

Abstract

Model editing is a novel research topic in large language models (LLMs), aimed at efficiently handling various knowledge editing tasks. Since irrelevant knowledge is difficult to measure, existing editing methods often lack explicit ways to preserve it, especially for editing methods based on the fine-tuning paradigm. They generally control the locality performance of model editing by constraining the range of changes in model parameters. However, their performance improvements are not always ideal, and may even lead to a decrease in the editing reliability. In this paper, we try to explore effective editing locality control methods based on the relationship between the stored knowledge and the strongly associated model components. Based on the discovery of ``knowledge neurons'' and enough experimental results, we further explore the potential characteristics between knowledge and model components, confirm and point out: (1) only 1% neurons have significant contributions to specific knowledge storage, and (2) these targeted neurons often have a high overlap for knowledge with similar relational descriptions, which means that knowledge with similar relationships may be severely affected when these targeted neurons are modified. Based on these findings, we propose Targeted Neurons Fine-tuning with Data Augmentation (TNF-DA), which performs data augmentation based on the relational representation of edited knowledge to improve editing locality. By freezing most of the model parameters and only fine-tuning the highly contributing neurons corresponding to the edited knowledge, we obtain desirable results in terms of generalization and specificity compared with previous fine-tuning-based methods. Extensive experiments have demonstrated the superior editing performance achieved by our proposed method.

Downloads

Published

2025-04-11

How to Cite

Zhou, W., Wei, W., Cao, G., & Wang, F. (2025). Editing Memories Through Few Targeted Neurons. Proceedings of the AAAI Conference on Artificial Intelligence, 39(24), 26111–26119. https://doi.org/10.1609/aaai.v39i24.34807

Issue

Section

AAAI Technical Track on Natural Language Processing III