Dormant Backdoor: Weaponizing Model Finetuning for Feasible Backdoor Attacks Against Pretrained Models

Authors

  • Ruitao Li Institute of Information Science, Beijing Jiaotong University
  • Jiakai Wang Institute of Information Science, Beijing Jiaotong University
  • Hairong Chen Institute of Information Science, Beijing Jiaotong University
  • Huihu Ding Institute of Information Science, Beijing Jiaotong University
  • Jinghan Zhou Institute of Information Science, Beijing Jiaotong University
  • Renshuai Tao Institute of Information Science, Beijing Jiaotong University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University Visual Intelligence +X International Cooperation Joint Laboratory of MOE

DOI:

https://doi.org/10.1609/aaai.v40i27.39480

Abstract

As the pretraining-finetuning paradigm becomes dominant in modern AI, the security of model supply chains faces new risks from backdoor attacks. Existing work primarily studies backdoors injected during pretraining and treats subsequent finetuning with clean data as a defense, while recent finetuning-activated attacks assume white-box access to the downstream data distribution, which is rarely realistic in practice. We introduce Dormant Backdoor, a finetuning-activated attack that requires no prior knowledge of downstream tasks. Instead of binding the backdoor to static input patterns, Dormant Backdoor exploits the universal dynamics of gradient-based optimization as a process-as-trigger mechanism. We formulate the attack as a bilevel optimization problem that simulates the victim's finetuning trajectory on proxy data, and jointly optimizes the poisoned model and trigger under lethality, utility, and stealth objectives. Before finetuning, the poisoned model remains behaviorally close to a clean model and can evade existing backdoor detectors; after finetuning, the same adaptation process reliably amplifies the backdoor on diverse downstream datasets and finetuning strategies. Our results reveal a previously underexplored class of process-as-trigger vulnerabilities and highlight the need for defenses that explicitly secure the model adaptation process.

Downloads

Published

2026-03-14

How to Cite

Li, R., Wang, J., Chen, H., Ding, H., Zhou, J., & Tao, R. (2026). Dormant Backdoor: Weaponizing Model Finetuning for Feasible Backdoor Attacks Against Pretrained Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(27), 23132–23140. https://doi.org/10.1609/aaai.v40i27.39480

Issue

Section

AAAI Technical Track on Machine Learning IV