Boosting the Transferability of Video Adversarial Examples via Temporal Translation

Authors

  • Zhipeng Wei Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Shanghai Collaborative Innovation Center on Intelligent Visual Computing
  • Jingjing Chen Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Shanghai Collaborative Innovation Center on Intelligent Visual Computing
  • Zuxuan Wu Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Shanghai Collaborative Innovation Center on Intelligent Visual Computing
  • Yu-Gang Jiang Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Shanghai Collaborative Innovation Center on Intelligent Visual Computing

DOI:

https://doi.org/10.1609/aaai.v36i3.20168

Keywords:

Computer Vision (CV)

Abstract

Although deep-learning based video recognition models have achieved remarkable success, they are vulnerable to adversarial examples that are generated by adding human-imperceptible perturbations on clean video samples. As indicated in recent studies, adversarial examples are transferable, which makes it feasible for black-box attacks in real-world applications. Nevertheless, most existing adversarial attack methods have poor transferability when attacking other video models and transfer-based attacks on video models are still unexplored. To this end, we propose to boost the transferability of video adversarial examples for black-box attacks on video recognition models. Through extensive analysis, we discover that different video recognition models rely on different discriminative temporal patterns, leading to the poor transferability of video adversarial examples. This motivates us to introduce a temporal translation attack method, which optimizes the adversarial perturbations over a set of temporal translated video clips. By generating adversarial examples over translated videos, the resulting adversarial examples are less sensitive to temporal patterns existed in the white-box model being attacked and thus can be better transferred. Extensive experiments on the Kinetics-400 dataset and the UCF-101 dataset demonstrate that our method can significantly boost the transferability of video adversarial examples. For transfer-based attack against video recognition models, it achieves a 61.56% average attack success rate on the Kinetics-400 and 48.60% on the UCF-101.

Downloads

Published

2022-06-28

How to Cite

Wei, Z., Chen, J., Wu, Z., & Jiang, Y.-G. (2022). Boosting the Transferability of Video Adversarial Examples via Temporal Translation. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 2659-2667. https://doi.org/10.1609/aaai.v36i3.20168

Issue

Section

AAAI Technical Track on Computer Vision III