Dormant Backdoor: Weaponizing Model Finetuning for Feasible Backdoor Attacks Against Pretrained Models

Ruitao Li; Jiakai Wang; Hairong Chen; Huihu Ding; Jinghan Zhou; Renshuai Tao

doi:10.1609/aaai.v40i27.39480

Authors

Ruitao Li Institute of Information Science, Beijing Jiaotong University
Jiakai Wang Institute of Information Science, Beijing Jiaotong University
Hairong Chen Institute of Information Science, Beijing Jiaotong University
Huihu Ding Institute of Information Science, Beijing Jiaotong University
Jinghan Zhou Institute of Information Science, Beijing Jiaotong University
Renshuai Tao Institute of Information Science, Beijing Jiaotong University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University Visual Intelligence +X International Cooperation Joint Laboratory of MOE

DOI:

https://doi.org/10.1609/aaai.v40i27.39480

Abstract

As the pretraining-finetuning paradigm becomes dominant in modern AI, the security of model supply chains faces new risks from backdoor attacks. Existing work primarily studies backdoors injected during pretraining and treats subsequent finetuning with clean data as a defense, while recent finetuning-activated attacks assume white-box access to the downstream data distribution, which is rarely realistic in practice. We introduce Dormant Backdoor, a finetuning-activated attack that requires no prior knowledge of downstream tasks. Instead of binding the backdoor to static input patterns, Dormant Backdoor exploits the universal dynamics of gradient-based optimization as a process-as-trigger mechanism. We formulate the attack as a bilevel optimization problem that simulates the victim's finetuning trajectory on proxy data, and jointly optimizes the poisoned model and trigger under lethality, utility, and stealth objectives. Before finetuning, the poisoned model remains behaviorally close to a clean model and can evade existing backdoor detectors; after finetuning, the same adaptation process reliably amplifies the backdoor on diverse downstream datasets and finetuning strategies. Our results reveal a previously underexplored class of process-as-trigger vulnerabilities and highlight the need for defenses that explicitly secure the model adaptation process.

Dormant Backdoor: Weaponizing Model Finetuning for Feasible Backdoor Attacks Against Pretrained Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information