Inspecting Prediction Confidence for Detecting Black-Box Backdoor Attacks

Authors

  • Tong Wang Nanjing University
  • Yuan Yao Nanjing University
  • Feng Xu Nanjing University
  • Miao Xu University of Queensland
  • Shengwei An Purdue University
  • Ting Wang Stony Brook University

DOI:

https://doi.org/10.1609/aaai.v38i1.27780

Keywords:

APP: Security

Abstract

Backdoor attacks have been shown to be a serious security threat against deep learning models, and various defenses have been proposed to detect whether a model is backdoored or not. However, as indicated by a recent black-box attack, existing defenses can be easily bypassed by implanting the backdoor in the frequency domain. To this end, we propose a new defense DTInspector against black-box backdoor attacks, based on a new observation related to the prediction confidence of learning models. That is, to achieve a high attack success rate with a small amount of poisoned data, backdoor attacks usually render a model exhibiting statistically higher prediction confidences on the poisoned samples. We provide both theoretical and empirical evidence for the generality of this observation. DTInspector then carefully examines the prediction confidences of data samples, and decides the existence of backdoor using the shortcut nature of backdoor triggers. Extensive evaluations on six backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.

Downloads

Published

2024-03-25

How to Cite

Wang, T., Yao, Y., Xu, F., Xu, M., An, S., & Wang, T. (2024). Inspecting Prediction Confidence for Detecting Black-Box Backdoor Attacks. Proceedings of the AAAI Conference on Artificial Intelligence, 38(1), 274-282. https://doi.org/10.1609/aaai.v38i1.27780

Issue

Section

AAAI Technical Track on Application Domains