Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Authors

  • Md Rafi Ur Rashid Mitsubishi Electric Research Laboratories Pennsylvania State University
  • Jing Liu Mitsubishi Electric Research Labs
  • Toshiaki Koike-Akino Mitsubishi Electric Research Labs
  • Ye Wang Mitsubishi Electric Research Labs
  • Shagufta Mehnaz Pennsylvania State University

DOI:

https://doi.org/10.1609/aaai.v39i19.34218

Abstract

Fine-tuning large language models on private data for downstream applications poses significant privacy risks in potentially exposing sensitive information. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models, allowing anyone to publish without rigorous verification. This scenario creates a privacy threat, as pre-trained models can be intentionally crafted to compromise the privacy of fine-tuning datasets. In this study, we introduce a novel poisoning technique that uses model-unlearning as an attack tool. This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process. Our method enhances both membership inference and data extraction attacks while preserving model utility. Experimental results across different models, datasets, and fine-tuning setups demonstrate that our attacks significantly surpass baseline performance. This work serves as a cautionary note for users who download pretrained models from unverified sources, highlighting the potential risks involved.

Downloads

Published

2025-04-11

How to Cite

Rashid, M. R. U., Liu, J., Koike-Akino, T., Wang, Y., & Mehnaz, S. (2025). Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage. Proceedings of the AAAI Conference on Artificial Intelligence, 39(19), 20139–20147. https://doi.org/10.1609/aaai.v39i19.34218

Issue

Section

AAAI Technical Track on Machine Learning V