Poisoned Distillation: Injecting Backdoors into Distilled Datasets Without Raw Data Access

Authors

  • Ziyuan Yang School of Cyber Science and Engineering, Sichuan University Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR), Singapore Tianfu Jiangxi Laboratory
  • Ming Yan Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR) Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR)
  • Yi Zhang School of Cyber Science and Engineering, Sichuan University Tianfu Jiangxi Laboratory
  • Joey Tianyi Zhou Centre for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR) Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR)

DOI:

https://doi.org/10.1609/aaai.v40i2.37119

Abstract

Dataset distillation (DD) condenses large datasets into smaller synthetic ones to enhance training efficiency and reducing bandwidth. DD enables models to achieve comparable performance to those trained on the raw full dataset, making it popular for data sharing. Existing work shows that injecting backdoors during the distillation process can threaten downstream models. However, these studies assume attackers can have access to the raw dataset and interfere with the entire distillation process, which is unrealistic. In contrast, this work is the first to address a more realistic and concerning threat: attackers may intercept the dataset distribution process, inject backdoors into the distilled datasets, and redistribute them to users. While distilled datasets were previously considered resistant to backdoor attacks, we demonstrate that they remain vulnerable to such attacks. Furthermore, we show that attackers do not even require access to any raw data to inject the backdoors successfully within one minute. Specifically, our approach reconstructs conceptual archetypes for each class from the model trained on the distilled dataset. Backdoors are then injected into these archetypes to update the distilled dataset. Moreover, we ensure the updated dataset not only retains the backdoor but also preserves the original optimization trajectory, thus maintaining the knowledge of the raw dataset. To achieve this, a hybrid loss is designed to integrate backdoor information along the benign optimization trajectory, ensuring that previously learned information is not forgotten. Extensive experiments demonstrate that distilled datasets are highly vulnerable to our attack, with risks pervasive across various raw datasets, distillation methods, and downstream training strategies.

Published

2026-03-14

How to Cite

Yang, Z., Yan, M., Zhang, Y., & Zhou, J. T. (2026). Poisoned Distillation: Injecting Backdoors into Distilled Datasets Without Raw Data Access. Proceedings of the AAAI Conference on Artificial Intelligence, 40(2), 1444–1452. https://doi.org/10.1609/aaai.v40i2.37119

Issue

Section

AAAI Technical Track on Application Domains II