Generating Natural Language Attacks in a Hard Label Black Box Setting

Authors

  • Rishabh Maheshwary Data Sciences and Analytics Center, Kohli Center on Intelligent Systems International Institute of Information Technology, Hyderabad, India
  • Saket Maheshwary Data Sciences and Analytics Center, Kohli Center on Intelligent Systems International Institute of Information Technology, Hyderabad, India
  • Vikram Pudi Data Sciences and Analytics Center, Kohli Center on Intelligent Systems International Institute of Information Technology, Hyderabad, India

Keywords:

Adversarial Attacks & Robustness, Applications

Abstract

We study an important and challenging task of attacking natural language processing models in a hard label black box setting. We propose a decision-based attack strategy that crafts high quality adversarial examples on text classification and entailment tasks. Our proposed attack strategy leverages population-based optimization algorithm to craft plausible and semantically similar adversarial examples by observing only the top label predicted by the target model. At each iteration, the optimization procedure allow word replacements that maximizes the overall semantic similarity between the original and the adversarial text. Further, our approach does not rely on using substitute models or any kind of training data. We demonstrate the efficacy of our proposed approach through extensive experimentation and ablation studies on five state-of-the-art target models across seven benchmark datasets. In comparison to attacks proposed in prior literature, we are able to achieve a higher success rate with lower word perturbation percentage that too in a highly restricted setting.

Downloads

Published

2021-05-18

How to Cite

Maheshwary, R., Maheshwary, S., & Pudi, V. (2021). Generating Natural Language Attacks in a Hard Label Black Box Setting. Proceedings of the AAAI Conference on Artificial Intelligence, 35(15), 13525-13533. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17595

Issue

Section

AAAI Technical Track on Speech and Natural Language Processing II