TY - JOUR AU - Wei, Xingxing AU - Zhu, Jun AU - Yuan, Sha AU - Su, Hang PY - 2019/07/17 Y2 - 2024/03/29 TI - Sparse Adversarial Perturbations for Videos JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 33 IS - 01 SE - AAAI Technical Track: Vision DO - 10.1609/aaai.v33i01.33018973 UR - https://ojs.aaai.org/index.php/AAAI/article/view/4927 SP - 8973-8980 AB - <p>Although adversarial samples of deep neural networks (DNNs) have been intensively studied on static images, their extensions in videos are never explored. Compared with images, attacking a video needs to consider not only spatial cues but also temporal cues. Moreover, to improve the imperceptibility as well as reduce the computation cost, perturbations should be added on as few frames as possible, i.e., adversarial perturbations are temporally <em>sparse</em>. This further motivates the <em>propagation</em> of perturbations, which denotes that perturbations added on the current frame can transfer to the next frames via their temporal interactions. Thus, no (or few) extra perturbations are needed for these frames to misclassify them. To this end, we propose the first white-box video attack method, which utilizes an <em>l</em><sub>2<em>,</em>1</sub>-norm based optimization algorithm to compute the sparse adversarial perturbations for videos. We choose the action recognition as the targeted task, and networks with a CNN+RNN architecture as threat models to verify our method. Thanks to the propagation, we can compute perturbations on a shortened version video, and then adapt them to the long version video to fool DNNs. Experimental results on the UCF101 dataset demonstrate that even only one frame in a video is perturbed, the fooling rate can still reach 59.7%.</p> ER -