Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

Authors

  • Lingzhi Wang Harbin Institute of Technology, Shenzhen, China
  • Xingshan Zeng Huawei Noah's Ark Lab, Hong Kong, China
  • Jinsong Guo Unlimidata Ltd, United Kingdom
  • Kam-Fai Wong The Chinese University of Hong Kong, Hong Kong, China MoE Key Laboratory of High Confidence Software Technologies, China
  • Georg Gottlob University of Calabria, Italy

DOI:

https://doi.org/10.1609/aaai.v39i1.32068

Abstract

This paper explores Machine Unlearning (MU), an emerging field that is gaining increased attention due to concerns about neural models unintentionally remembering personal or sensitive information. We present SeUL, a novel method that enables selective and fine-grained unlearning for language models. Unlike previous work that employs a fully reversed training objective in unlearning, SeUL minimizes the negative impact on the capability of language models, particularly in terms of generation. Furthermore, we introduce two innovative evaluation metrics, sensitive extraction likelihood (S-EL) and sensitive memorization accuracy (S-MA), specifically designed to assess the effectiveness of forgetting sensitive information. In support of the unlearning framework, we propose efficient automatic online and offline sensitive span annotation methods. The online selection method, based on language probability scores, ensures computational efficiency, while the offline annotation involves a two-stage LLM-based process for robust verification. In summary, this paper contributes a novel selective unlearning method (SeUL), introduces specialized evaluation metrics (S-EL and S-MA) for assessing sensitive information forgetting, and proposes automatic online and offline sensitive span annotation methods to support the overall unlearning framework and evaluation.

Downloads

Published

2025-04-11

How to Cite

Wang, L., Zeng, X., Guo, J., Wong, K.-F., & Gottlob, G. (2025). Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(1), 843-851. https://doi.org/10.1609/aaai.v39i1.32068

Issue

Section

AAAI Technical Track on Application Domains