Training Meta-Surrogate Model for Transferable Adversarial Attack

Authors

  • Yunxiao Qin State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, China Neuroscience and Intelligent Media Institute, Communication University of China, Beijing, China
  • Yuanhao Xiong University of California, Los Angeles, USA
  • Jinfeng Yi JD AI Research, Beijing, China
  • Cho-Jui Hsieh University of California, Los Angeles, USA

DOI:

https://doi.org/10.1609/aaai.v37i8.26139

Keywords:

ML: Adversarial Learning & Robustness, CV: Adversarial Attacks & Robustness, PEAI: Safety, Robustness & Trustworthiness

Abstract

The problem of adversarial attacks to a black-box model when no queries are allowed has posed a great challenge to the community and has been extensively investigated. In this setting, one simple yet effective method is to transfer the obtained adversarial examples from attacking surrogate models to fool the target model. Previous works have studied what kind of attacks to the surrogate model can generate more transferable adversarial examples, but their performances are still limited due to the mismatches between surrogate models and the target model. In this paper, we tackle this problem from a novel angle---instead of using the original surrogate models, can we obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easily transferred to other models? We show that this goal can be mathematically formulated as a bi-level optimization problem and design a differentiable attacker to make training feasible. Given one or a set of surrogate models, our method can thus obtain an MSM such that adversarial examples generated on MSM enjoy eximious transferability. Comprehensive experiments on Cifar-10 and ImageNet demonstrate that by attacking the MSM, we can obtain stronger transferable adversarial examples to deceive black-box models including adversarially trained ones, with much higher success rates than existing methods.

Downloads

Published

2023-06-26

How to Cite

Qin, Y., Xiong, Y., Yi, J., & Hsieh, C.-J. (2023). Training Meta-Surrogate Model for Transferable Adversarial Attack. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9516-9524. https://doi.org/10.1609/aaai.v37i8.26139

Issue

Section

AAAI Technical Track on Machine Learning III