AGS: Affordable and Generalizable Substitute Training for Transferable Adversarial Attack

Authors

  • Ruikui Wang State Key Laboratory of Software Development Environment, Beihang University, China School of Computer Science and Engineering, Beihang University, China
  • Yuanfang Guo State Key Laboratory of Software Development Environment, Beihang University, China School of Computer Science and Engineering, Beihang University, China
  • Yunhong Wang School of Computer Science and Engineering, Beihang University, China

DOI:

https://doi.org/10.1609/aaai.v38i6.28365

Keywords:

CV: Adversarial Attacks & Robustness, ML: Adversarial Learning & Robustness

Abstract

In practical black-box attack scenarios, most of the existing transfer-based attacks employ pretrained models (e.g. ResNet50) as the substitute models. Unfortunately, these substitute models are not always appropriate for transfer-based attacks. Firstly, these models are usually trained on a largescale annotated dataset, which is extremely expensive and time-consuming to construct. Secondly, the primary goal of these models is to perform a specific task, such as image classification, which is not developed for adversarial attacks. To tackle the above issues, i.e., high cost and over-fitting on taskspecific models, we propose an Affordable and Generalizable Substitute (AGS) training framework tailored for transferbased adversarial attack. Specifically, we train the substitute model from scratch by our proposed adversary-centric constrastive learning. This proposed learning mechanism introduces another sample with slight adversarial perturbations as an additional positive view of the input image, and then encourages the adversarial view and two benign views to interact comprehensively with each other. To further boost the generalizability of the substitute model, we propose adversarial invariant learning to maintain the representations of the adversarial example invariants under augmentations with various strengths. Our AGS model can be trained solely with unlabeled and out-of domain data and avoid overfitting to any task-specific models, because of its inherently self-supervised nature. Extensive experiments demonstrate that our AGS achieves comparable or superior performance compared to substitute models pretrained on the complete ImageNet training set, when executing attacks across a diverse range of target models, including ViTs, robustly trained models, object detection and segmentation models. Our source codes are available at https://github.com/lwmming/AGS.

Published

2024-03-24

How to Cite

Wang, R., Guo, Y., & Wang, Y. (2024). AGS: Affordable and Generalizable Substitute Training for Transferable Adversarial Attack. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5553-5562. https://doi.org/10.1609/aaai.v38i6.28365

Issue

Section

AAAI Technical Track on Computer Vision V