Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Authors

  • Siyuan Cheng Purdue University
  • Yingqi Liu Purdue University
  • Shiqing Ma Rutgers University
  • Xiangyu Zhang Purdue University

Keywords:

Adversarial Attacks & Robustness, Adversarial Learning & Robustness

Abstract

Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.

Downloads

Published

2021-05-18

How to Cite

Cheng, S., Liu, Y., Ma, S., & Zhang, X. (2021). Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1148-1156. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16201

Issue

Section

AAAI Technical Track on Computer Vision I