Exposing Deep-faked Videos by Anomalous Co-motion Pattern Detection

View on arXiv ← Back to list

Authors: Gengxing Wang, Jiahuan Zhou, Ying Wu

Published: 2020-08-11 16:47:02+00:00

AI Summary

This paper proposes a fully interpretable video forensic method to detect deepfake videos by analyzing co-motion patterns. It models the temporal motion of multiple spatial locations to extract a robust representation, independent of video content, that is robust to compression artifacts.

Abstract

Recent deep learning based video synthesis approaches, in particular with applications that can forge identities such as DeepFake, have raised great security concerns. Therefore, corresponding deep forensic methods are proposed to tackle this problem. However, existing methods are either based on unexplainable deep networks which greatly degrades the principal interpretability factor to media forensic, or rely on fragile image statistics such as noise pattern, which in real-world scenarios can be easily deteriorated by data compression. In this paper, we propose an fully-interpretable video forensic method that is designed specifically to expose deep-faked videos. To enhance generalizability on videos with various content, we model the temporal motion of multiple specific spatial locations in the videos to extract a robust and reliable representation, called Co-Motion Pattern. Such kind of conjoint pattern is mined across local motion features which is independent of the video contents so that the instance-wise variation can also be largely alleviated. More importantly, our proposed co-motion pattern possesses both superior interpretability and sufficient robustness against data compression for deep-faked videos. We conduct extensive experiments to empirically demonstrate the superiority and effectiveness of our approach under both classification and anomaly detection evaluation settings against the state-of-the-art deep forensic methods.

Key findings

The proposed co-motion pattern effectively distinguishes deepfake videos from real videos with high accuracy in both classification and anomaly detection settings. The approach is robust to compression and noise, and shows good generalizability across different deepfake generation methods.

Approach

The method extracts local motion features from facial landmarks in video frames. These features are grouped based on their correlation, forming a co-motion pattern representing motion consistency. Deepfake videos show anomalous co-motion patterns compared to real videos.

Datasets

FaceForensics++ (including Deepfake, FaceSwap, Face2Face, and NeuralTexture), and a real video dataset from Google AI.

Model(s)

AdaBoost classifier; the main contribution is not a model but a feature extraction method (co-motion pattern).

Author countries

USA

← Previous