Bigram and Unigram Based Text Attack via Adaptive Monotonic Heuristic Search
Keywords:Security, Adversarial Learning & Robustness, Adversarial Attacks & Robustness
AbstractDeep neural networks (DNNs) are known to be vulnerable to adversarial images, while their robustness in text classification are rarely studied. Several lines of text attack methods have been proposed in the literature, such as character-level, word-level, and sentence-level attacks. However, it is still a challenge to minimize the number of word distortions necessary to induce misclassification, while simultaneously ensuring the lexical correctness, syntactic correctness, and semantic similarity. In this paper, we propose the Bigram and Unigram based Monotonic Heuristic Search (BU-MHS) method to examine the vulnerability of deep models. Our method has three major merits. Firstly, we propose to attack text documents not only at the unigram word level but also at the bigram level to avoid producing meaningless outputs. Secondly, we propose a hybrid method to replace the input words with both their synonyms and sememe candidates, which greatly enriches potential substitutions compared to only using synonyms. Lastly, we design a search algorithm, i.e., Monotonic Heuristic Search (MHS), to determine the priority of word replacements, aiming to reduce the modification cost in an adversarial attack. We evaluate the effectiveness of BU-MHS on IMDB, AG's News, and Yahoo! Answers text datasets by attacking four state-of-the-art DNNs models. Experimental results show that our BU-MHS achieves the highest attack success rate by changing the smallest number of words compared with other existing models.
How to Cite
Yang, X., Liu, W., Bailey, J., Tao, D., & Liu, W. (2021). Bigram and Unigram Based Text Attack via Adaptive Monotonic Heuristic Search. Proceedings of the AAAI Conference on Artificial Intelligence, 35(1), 706-714. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16151
AAAI Technical Track on Application Domains