Backdoor Attacks via Machine Unlearning

Authors

  • Zihao Liu Iowa State University
  • Tianhao Wang University of Virginia
  • Mengdi Huai Iowa State University
  • Chenglin Miao Iowa State University

DOI:

https://doi.org/10.1609/aaai.v38i13.29321

Keywords:

ML: Adversarial Learning & Robustness, ML: Classification and Regression, PEAI: Safety, Robustness & Trustworthiness

Abstract

As a new paradigm to erase data from a model and protect user privacy, machine unlearning has drawn significant attention. However, existing studies on machine unlearning mainly focus on its effectiveness and efficiency, neglecting the security challenges introduced by this technique. In this paper, we aim to bridge this gap and study the possibility of conducting malicious attacks leveraging machine unlearning. Specifically, we consider the backdoor attack via machine unlearning, where an attacker seeks to inject a backdoor in the unlearned model by submitting malicious unlearning requests, so that the prediction made by the unlearned model can be changed when a particular trigger presents. In our study, we propose two attack approaches. The first attack approach does not require the attacker to poison any training data of the model. The attacker can achieve the attack goal only by requesting to unlearn a small subset of his contributed training data. The second approach allows the attacker to poison a few training instances with a pre-defined trigger upfront, and then activate the attack via submitting a malicious unlearning request. Both attack approaches are proposed with the goal of maximizing the attack utility while ensuring attack stealthiness. The effectiveness of the proposed attacks is demonstrated with different machine unlearning algorithms as well as different models on different datasets.

Published

2024-03-24

How to Cite

Liu, Z., Wang, T., Huai, M., & Miao, C. (2024). Backdoor Attacks via Machine Unlearning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(13), 14115-14123. https://doi.org/10.1609/aaai.v38i13.29321

Issue

Section

AAAI Technical Track on Machine Learning IV