Backdoor Attacks via Machine Unlearning

Zihao Liu; Tianhao Wang; Mengdi Huai; Chenglin Miao

doi:10.1609/aaai.v38i13.29321

Authors

Zihao Liu Iowa State University
Tianhao Wang University of Virginia
Mengdi Huai Iowa State University
Chenglin Miao Iowa State University

DOI:

https://doi.org/10.1609/aaai.v38i13.29321

Keywords:

ML: Adversarial Learning & Robustness, ML: Classification and Regression, PEAI: Safety, Robustness & Trustworthiness

Abstract

As a new paradigm to erase data from a model and protect user privacy, machine unlearning has drawn significant attention. However, existing studies on machine unlearning mainly focus on its effectiveness and efficiency, neglecting the security challenges introduced by this technique. In this paper, we aim to bridge this gap and study the possibility of conducting malicious attacks leveraging machine unlearning. Specifically, we consider the backdoor attack via machine unlearning, where an attacker seeks to inject a backdoor in the unlearned model by submitting malicious unlearning requests, so that the prediction made by the unlearned model can be changed when a particular trigger presents. In our study, we propose two attack approaches. The first attack approach does not require the attacker to poison any training data of the model. The attacker can achieve the attack goal only by requesting to unlearn a small subset of his contributed training data. The second approach allows the attacker to poison a few training instances with a pre-defined trigger upfront, and then activate the attack via submitting a malicious unlearning request. Both attack approaches are proposed with the goal of maximizing the attack utility while ensuring attack stealthiness. The effectiveness of the proposed attacks is demonstrated with different machine unlearning algorithms as well as different models on different datasets.

Backdoor Attacks via Machine Unlearning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription