Adversarial Initialization with Universal Adversarial Perturbation: A New Approach to Fast Adversarial Training

Authors

  • Chao Pan Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen 518055, China Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China The Hong Kong Polytechnic University, Hong Kong, China
  • Qing Li The Hong Kong Polytechnic University, Hong Kong, China
  • Xin Yao Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen 518055, China Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China

DOI:

https://doi.org/10.1609/aaai.v38i19.30147

Keywords:

General

Abstract

Traditional adversarial training, while effective at improving machine learning model robustness, is computationally intensive. Fast Adversarial Training (FAT) addresses this by using a single-step attack to generate adversarial examples more efficiently. Nonetheless, FAT is susceptible to a phenomenon known as catastrophic overfitting, wherein the model's adversarial robustness abruptly collapses to zero during the training phase. To address this challenge, recent studies have suggested adopting adversarial initialization with Fast Gradient Sign Method Adversarial Training (FGSM-AT), which recycles adversarial perturbations from prior epochs by computing gradient momentum. However, our research has uncovered a flaw in this approach. Given that data augmentation is employed during the training phase, the samples in each epoch are not identical. Consequently, the method essentially yields not the adversarial perturbation of a singular sample, but rather the Universal Adversarial Perturbation (UAP) of a sample and its data augmentation. This insight has led us to explore the potential of using UAPs for adversarial initialization within the context of FGSM-AT. We have devised various strategies for adversarial initialization utilizing UAPs, including single, class-based, and feature-based UAPs. Experiments conducted on three distinct datasets demonstrate that our method achieves an improved trade-off among robustness, computational cost, and memory footprint. Code is available at https://github.com/fzjcdt/fgsm-uap.

Published

2024-03-24

How to Cite

Pan, C., Li, Q., & Yao, X. (2024). Adversarial Initialization with Universal Adversarial Perturbation: A New Approach to Fast Adversarial Training. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 21501–21509. https://doi.org/10.1609/aaai.v38i19.30147

Issue

Section

AAAI Technical Track on Safe, Robust and Responsible AI Track