Delving into the Local: Dynamic Inconsistency Learning for DeepFake Video Detection

Zhihao Gu; Yang Chen; Taiping Yao; Shouhong Ding; Jilin Li; Lizhuang Ma

doi:10.1609/aaai.v36i1.19955

Authors

Zhihao Gu School of Electronic and Electrical Engineering, Shanghai Jiao Tong University YouTu Lab, Tencent
Yang Chen YouTu Lab, Tencent
Taiping Yao YouTu Lab, Tencent
Shouhong Ding YouTu Lab, Tencent
Jilin Li YouTu Lab, Tencent
Lizhuang Ma School of Electronic and Electrical Engineering, Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University East China Normal University

DOI:

https://doi.org/10.1609/aaai.v36i1.19955

Keywords:

Computer Vision (CV)

Abstract

The rapid development of facial manipulation techniques has aroused public concerns in recent years. Existing deepfake video detection approaches attempt to capture the discrim- inative features between real and fake faces based on tem- poral modelling. However, these works impose supervisions on sparsely sampled video frames but overlook the local mo- tions among adjacent frames, which instead encode rich in- consistency information that can serve as an efficient indica- tor for DeepFake video detection. To mitigate this issue, we delves into the local motion and propose a novel sampling unit named snippet which contains a few successive videos frames for local temporal inconsistency learning. Moreover, we elaborately design an Intra-Snippet Inconsistency Module (Intra-SIM) and an Inter-Snippet Interaction Module (Inter- SIM) to establish a dynamic inconsistency modelling frame- work. Specifically, the Intra-SIM applies bi-directional tem- poral difference operations and a learnable convolution ker- nel to mine the short-term motions within each snippet. The Inter-SIM is then devised to promote the cross-snippet infor- mation interaction to form global representations. The Intra- SIM and Inter-SIM work in an alternate manner and can be plugged into existing 2D CNNs. Our method outperforms the state of the art competitors on four popular benchmark dataset, i.e., FaceForensics++, Celeb-DF, DFDC and Wild- Deepfake. Besides, extensive experiments and visualizations are also presented to further illustrate its effectiveness.

Delving into the Local: Dynamic Inconsistency Learning for DeepFake Video Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription