SSPAttack: A Simple and Sweet Paradigm for Black-Box Hard-Label Textual Adversarial Attack
DOI:
https://doi.org/10.1609/aaai.v37i11.26553Keywords:
SNLP: Adversarial Attacks & RobustnessAbstract
Hard-label textual adversarial attack is a challenging task, as only the predicted label information is available, and the text space is discrete and non-differentiable. Relevant research work is still in fancy and just a handful of methods are proposed. However, existing methods suffer from either the high complexity of genetic algorithms or inaccurate gradient estimation, thus are arduous to obtain adversarial examples with high semantic similarity and low perturbation rate under the tight-budget scenario. In this paper, we propose a simple and sweet paradigm for hard-label textual adversarial attack, named SSPAttack. Specifically, SSPAttack first utilizes initialization to generate an adversarial example, and removes unnecessary replacement words to reduce the number of changed words. Then it determines the replacement order and searches for an anchor synonym, thus avoiding going through all the synonyms. Finally, it pushes substitution words towards original words until an appropriate adversarial example is obtained. The core idea of SSPAttack is just swapping words whose mechanism is simple. Experimental results on eight benchmark datasets and two real-world APIs have shown that the performance of SSPAttack is sweet in terms of similarity, perturbation rate and query efficiency.Downloads
Published
2023-06-26
How to Cite
Liu, H., Xu, Z., Zhang, X., Xu, X., Zhang, F., Ma, F., Chen, H., Yu, H., & Zhang, X. (2023). SSPAttack: A Simple and Sweet Paradigm for Black-Box Hard-Label Textual Adversarial Attack. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 13228-13235. https://doi.org/10.1609/aaai.v37i11.26553
Issue
Section
AAAI Technical Track on Speech & Natural Language Processing