Editing Memories Through Few Targeted Neurons

Wei Zhou; Wei Wei; Guibang Cao; Fei Wang

doi:10.1609/aaai.v39i24.34807

Authors

Wei Zhou Cognitive Computing and Intelligent Information Processing (CCIIP) Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, China Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL), China
Wei Wei Cognitive Computing and Intelligent Information Processing (CCIIP) Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, China Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL), China
Guibang Cao Ping An Property & Casualty Insurance Company of China, Ltd
Fei Wang Institute of Computing Technology, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v39i24.34807

Abstract

Model editing is a novel research topic in large language models (LLMs), aimed at efficiently handling various knowledge editing tasks. Since irrelevant knowledge is difficult to measure, existing editing methods often lack explicit ways to preserve it, especially for editing methods based on the fine-tuning paradigm. They generally control the locality performance of model editing by constraining the range of changes in model parameters. However, their performance improvements are not always ideal, and may even lead to a decrease in the editing reliability. In this paper, we try to explore effective editing locality control methods based on the relationship between the stored knowledge and the strongly associated model components. Based on the discovery of ``knowledge neurons'' and enough experimental results, we further explore the potential characteristics between knowledge and model components, confirm and point out: (1) only 1% neurons have significant contributions to specific knowledge storage, and (2) these targeted neurons often have a high overlap for knowledge with similar relational descriptions, which means that knowledge with similar relationships may be severely affected when these targeted neurons are modified. Based on these findings, we propose Targeted Neurons Fine-tuning with Data Augmentation (TNF-DA), which performs data augmentation based on the relational representation of edited knowledge to improve editing locality. By freezing most of the model parameters and only fine-tuning the highly contributing neurons corresponding to the edited knowledge, we obtain desirable results in terms of generalization and specificity compared with previous fine-tuning-based methods. Extensive experiments have demonstrated the superior editing performance achieved by our proposed method.

Editing Memories Through Few Targeted Neurons

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information