TextHoaxer: Budgeted Hard-Label Adversarial Attacks on Text

Authors

  • Muchao Ye The Pennsylvania State University
  • Chenglin Miao University of Georgia
  • Ting Wang The Pennsylvania State University
  • Fenglong Ma The Pennsylvania State University

DOI:

https://doi.org/10.1609/aaai.v36i4.20303

Keywords:

Constraint Satisfaction And Optimization (CSO), Speech & Natural Language Processing (SNLP)

Abstract

This paper focuses on a newly challenging setting in hard-label adversarial attacks on text data by taking the budget information into account. Although existing approaches can successfully generate adversarial examples in the hard-label setting, they follow an ideal assumption that the victim model does not restrict the number of queries. However, in real-world applications the query budget is usually tight or limited. Moreover, existing hard-label adversarial attack techniques use the genetic algorithm to optimize discrete text data by maintaining a number of adversarial candidates during optimization, which can lead to the problem of generating low-quality adversarial examples in the tight-budget setting. To solve this problem, in this paper, we propose a new method named TextHoaxer by formulating the budgeted hard-label adversarial attack task on text data as a gradient-based optimization problem of perturbation matrix in the continuous word embedding space. Compared with the genetic algorithm-based optimization, our solution only uses a single initialized adversarial example as the adversarial candidate for optimization, which significantly reduces the number of queries. The optimization is guided by a new objective function consisting of three terms, i.e., semantic similarity term, pair-wise perturbation constraint, and sparsity constraint. Semantic similarity term and pair-wise perturbation constraint can ensure the high semantic similarity of adversarial examples from both comprehensive text-level and individual word-level, while the sparsity constraint explicitly restricts the number of perturbed words, which is also helpful for enhancing the quality of generated text. We conduct extensive experiments on eight text datasets against three representative natural language models, and experimental results show that TextHoaxer can generate high-quality adversarial examples with higher semantic similarity and lower perturbation rate under the tight-budget setting.

Downloads

Published

2022-06-28

How to Cite

Ye, M., Miao, C., Wang, T., & Ma, F. (2022). TextHoaxer: Budgeted Hard-Label Adversarial Attacks on Text. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4), 3877-3884. https://doi.org/10.1609/aaai.v36i4.20303

Issue

Section

AAAI Technical Track on Constraint Satisfaction and Optimization