DIFT: Protecting Contrastive Learning Against Data Poisoning Backdoor Attacks

Authors

  • Jiang Zhu Hong Kong Polytechnic University
  • Yulin Jin Hong Kong Polytechnic University
  • Qingqing Ye The Hong Kong Polytechnic University
  • Zhibiao Guo Hong Kong Polytechnic University
  • Kun Fang The Hong Kong Polytechnic University
  • Ruochen Du Harbin Engineering University
  • Yingnan Zhao Harbin Institute of Technology
  • Haibo Hu Hong Kong Polytechnic University

DOI:

https://doi.org/10.1609/aaai.v40i2.37141

Abstract

Contrastive learning (CL) is a popular learning paradigm that excels in extracting meaningful representations from unlabeled data. Recent studies have shown that CL is highly vulnerable to backdoor attacks. Current defenses against backdoor attacks in CL are primarily reactive and post-training. That is, the detection and elimination of backdoors are executed in the deployment phase of a given well-trained model. However, these post-training defenses are usually prone to degrading model utility and resource-intensive, causing that the backdoor detection and elimination from a fully-trained model is quite challenging. To address this issue, we argue for a fundamental perspective, i.e., integrating the defense into the model's training phase, and propose a novel framework to mitigate the backdoor in CL, namely Density-Based Identification and Fine-Tuning (DIFT). Specifically, DIFT identifies potential poisoned samples during the early training phase via detecting embeddings with abnormal poisoning characteristic in the feature space. Then, to remove backdoors and preserve model utility, the detected poisoned samples are leveraged to fine-tune the model, and the remaining clean samples are further involved into training the model after the fine-tuning. DIFT, as a proactive training-time defense, avoids the problematic backdoor removal and the high computational cost associated with those reactive post-training methods. We empirically evaluate DIFT on various CL algorithms against backdoor attack. Experimental results demonstrate that our method exhibits promising defense effectiveness while maintaining model's clean data accuracy.

Downloads

Published

2026-03-14

How to Cite

Zhu, J., Jin, Y., Ye, Q., Guo, Z., Fang, K., Du, R., Zhao, Y., & Hu, H. (2026). DIFT: Protecting Contrastive Learning Against Data Poisoning Backdoor Attacks. Proceedings of the AAAI Conference on Artificial Intelligence, 40(2), 1641-1649. https://doi.org/10.1609/aaai.v40i2.37141

Issue

Section

AAAI Technical Track on Application Domains II