Boosting the Transferability of Video Adversarial Examples via Temporal Translation

Zhipeng Wei; Jingjing Chen; Zuxuan Wu; Yu-Gang Jiang

doi:10.1609/aaai.v36i3.20168

Authors

Zhipeng Wei Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Shanghai Collaborative Innovation Center on Intelligent Visual Computing
Jingjing Chen Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Shanghai Collaborative Innovation Center on Intelligent Visual Computing
Zuxuan Wu Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Shanghai Collaborative Innovation Center on Intelligent Visual Computing
Yu-Gang Jiang Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Shanghai Collaborative Innovation Center on Intelligent Visual Computing

DOI:

https://doi.org/10.1609/aaai.v36i3.20168

Keywords:

Computer Vision (CV)

Abstract

Although deep-learning based video recognition models have achieved remarkable success, they are vulnerable to adversarial examples that are generated by adding human-imperceptible perturbations on clean video samples. As indicated in recent studies, adversarial examples are transferable, which makes it feasible for black-box attacks in real-world applications. Nevertheless, most existing adversarial attack methods have poor transferability when attacking other video models and transfer-based attacks on video models are still unexplored. To this end, we propose to boost the transferability of video adversarial examples for black-box attacks on video recognition models. Through extensive analysis, we discover that different video recognition models rely on different discriminative temporal patterns, leading to the poor transferability of video adversarial examples. This motivates us to introduce a temporal translation attack method, which optimizes the adversarial perturbations over a set of temporal translated video clips. By generating adversarial examples over translated videos, the resulting adversarial examples are less sensitive to temporal patterns existed in the white-box model being attacked and thus can be better transferred. Extensive experiments on the Kinetics-400 dataset and the UCF-101 dataset demonstrate that our method can significantly boost the transferability of video adversarial examples. For transfer-based attack against video recognition models, it achieves a 61.56% average attack success rate on the Kinetics-400 and 48.60% on the UCF-101.

Boosting the Transferability of Video Adversarial Examples via Temporal Translation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription