Training Meta-Surrogate Model for Transferable Adversarial Attack
DOI:
https://doi.org/10.1609/aaai.v37i8.26139Keywords:
ML: Adversarial Learning & Robustness, CV: Adversarial Attacks & Robustness, PEAI: Safety, Robustness & TrustworthinessAbstract
The problem of adversarial attacks to a black-box model when no queries are allowed has posed a great challenge to the community and has been extensively investigated. In this setting, one simple yet effective method is to transfer the obtained adversarial examples from attacking surrogate models to fool the target model. Previous works have studied what kind of attacks to the surrogate model can generate more transferable adversarial examples, but their performances are still limited due to the mismatches between surrogate models and the target model. In this paper, we tackle this problem from a novel angle---instead of using the original surrogate models, can we obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easily transferred to other models? We show that this goal can be mathematically formulated as a bi-level optimization problem and design a differentiable attacker to make training feasible. Given one or a set of surrogate models, our method can thus obtain an MSM such that adversarial examples generated on MSM enjoy eximious transferability. Comprehensive experiments on Cifar-10 and ImageNet demonstrate that by attacking the MSM, we can obtain stronger transferable adversarial examples to deceive black-box models including adversarially trained ones, with much higher success rates than existing methods.Downloads
Published
2023-06-26
How to Cite
Qin, Y., Xiong, Y., Yi, J., & Hsieh, C.-J. (2023). Training Meta-Surrogate Model for Transferable Adversarial Attack. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9516-9524. https://doi.org/10.1609/aaai.v37i8.26139
Issue
Section
AAAI Technical Track on Machine Learning III