Exploring Non-target Knowledge for Improving Ensemble Universal Adversarial Attacks

Juanjuan Weng; Zhiming Luo; Zhun Zhong; Dazhen Lin; Shaozi Li

doi:10.1609/aaai.v37i3.25377

Authors

Juanjuan Weng Department of Artificial Intelligence, Xiamen University, China
Zhiming Luo Department of Artificial Intelligence, Xiamen University, China Fujian Key Laboratory of Big Data Application and Intellectualization for Tea Industry, Wuyi University, China
Zhun Zhong Department of Information Engineering and Computer Science, University of Trento, Italy
Dazhen Lin Department of Artificial Intelligence, Xiamen University, China
Shaozi Li Department of Artificial Intelligence, Xiamen University, China

DOI:

https://doi.org/10.1609/aaai.v37i3.25377

Keywords:

CV: Adversarial Attacks & Robustness

Abstract

The ensemble attack with average weights can be leveraged for increasing the transferability of universal adversarial perturbation (UAP) by training with multiple Convolutional Neural Networks (CNNs). However, after analyzing the Pearson Correlation Coefficients (PCCs) between the ensemble logits and individual logits of the crafted UAP trained by the ensemble attack, we find that one CNN plays a dominant role during the optimization. Consequently, this average weighted strategy will weaken the contributions of other CNNs and thus limit the transferability for other black-box CNNs. To deal with this bias issue, the primary attempt is to leverage the Kullback–Leibler (KL) divergence loss to encourage the joint contribution from different CNNs, which is still insufficient. After decoupling the KL loss into a target-class part and a non-target-class part, the main issue lies in that the non-target knowledge will be significantly suppressed due to the increasing logit of the target class. In this study, we simply adopt a KL loss that only considers the non-target classes for addressing the dominant bias issue. Besides, to further boost the transferability, we incorporate the min-max learning framework to self-adjust the ensemble weights for each CNN. Experiments results validate that considering the non-target KL loss can achieve superior transferability than the original KL loss by a large margin, and the min-max training can provide a mutual benefit in adversarial ensemble attacks. The source code is available at: https://github.com/WJJLL/ND-MM.

Exploring Non-target Knowledge for Improving Ensemble Universal Adversarial Attacks

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription