SSPAttack: A Simple and Sweet Paradigm for Black-Box Hard-Label Textual Adversarial Attack

Han Liu; Zhi Xu; Xiaotong Zhang; Xiaoming Xu; Feng Zhang; Fenglong Ma; Hongyang Chen; Hong Yu; Xianchao Zhang

doi:10.1609/aaai.v37i11.26553

Authors

Han Liu Dalian University of Technology
Zhi Xu Dalian University of Technology
Xiaotong Zhang Dalian University of Technology
Xiaoming Xu Dalian University of Technology
Feng Zhang Peking University
Fenglong Ma The Pennsylvania State University
Hongyang Chen Zhejiang Lab
Hong Yu Dalian University of Technology
Xianchao Zhang Dalian University of Technology

DOI:

https://doi.org/10.1609/aaai.v37i11.26553

Keywords:

SNLP: Adversarial Attacks & Robustness

Abstract

Hard-label textual adversarial attack is a challenging task, as only the predicted label information is available, and the text space is discrete and non-differentiable. Relevant research work is still in fancy and just a handful of methods are proposed. However, existing methods suffer from either the high complexity of genetic algorithms or inaccurate gradient estimation, thus are arduous to obtain adversarial examples with high semantic similarity and low perturbation rate under the tight-budget scenario. In this paper, we propose a simple and sweet paradigm for hard-label textual adversarial attack, named SSPAttack. Specifically, SSPAttack first utilizes initialization to generate an adversarial example, and removes unnecessary replacement words to reduce the number of changed words. Then it determines the replacement order and searches for an anchor synonym, thus avoiding going through all the synonyms. Finally, it pushes substitution words towards original words until an appropriate adversarial example is obtained. The core idea of SSPAttack is just swapping words whose mechanism is simple. Experimental results on eight benchmark datasets and two real-world APIs have shown that the performance of SSPAttack is sweet in terms of similarity, perturbation rate and query efficiency.

SSPAttack: A Simple and Sweet Paradigm for Black-Box Hard-Label Textual Adversarial Attack

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription