Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation

Authors

  • Xinshuo Hu Harbin Institute of Technology, Shenzhen
  • Dongfang Li Harbin Institute of Technology, Shenzhen
  • Baotian Hu Harbin Institute of Technology, Shenzhen
  • Zihao Zheng Harbin Institute of Technology, Shenzhen
  • Zhenyu Liu Harbin Institute of Technology, Shenzhen
  • Min Zhang Harbin Institute of Technology, Shenzhen

DOI:

https://doi.org/10.1609/aaai.v38i16.29784

Keywords:

NLP: (Large) Language Models, NLP: Generation

Abstract

Large language models (LLMs) have been widely used in various applications but are known to suffer from issues related to untruthfulness and toxicity. While parameter-efficient modules (PEMs) have demonstrated their effectiveness in equipping models with new skills, leveraging PEMs for deficiency unlearning remains underexplored. In this work, we propose a PEMs operation approach, namely Extraction-before-Subtraction (Ext-Sub), to enhance the truthfulness and detoxification of LLMs through the integration of ``expert'' PEM and ``anti-expert'' PEM. Remarkably, even anti-expert PEM possess valuable capabilities due to their proficiency in generating fabricated content, which necessitates language modeling and logical narrative competence. Rather than merely negating the parameters, our approach involves extracting and eliminating solely the deficiency capability within anti-expert PEM while preserving the general capabilities. To evaluate the effectiveness of our approach in terms of truthfulness and detoxification, we conduct extensive experiments on LLMs, encompassing additional abilities such as language modelling and mathematical reasoning. Our empirical results demonstrate that our approach effectively improves truthfulness and detoxification, while largely preserving the fundamental abilities of LLMs.

Published

2024-03-24

How to Cite

Hu, X., Li, D., Hu, B., Zheng, Z., Liu, Z., & Zhang, M. (2024). Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18252-18260. https://doi.org/10.1609/aaai.v38i16.29784

Issue

Section

AAAI Technical Track on Natural Language Processing I