Face Forgery Detection Based on Facial Region Displacement Trajectory Series

Authors: YuYang Sun, ZhiYong Zhang, Isao Echizen, Huy H. Nguyen, ChangZhen Qiu, Lu Sun

Published: 2022-12-07 14:47:54+00:00

AI Summary

This paper proposes a deepfake detection method based on analyzing the trajectory of facial region displacement. It utilizes a virtual-anchor-based method to extract this trajectory and employs a dual-stream spatial-temporal graph attention network with a GRU backbone to detect anomalies in the trajectory sequences.

Abstract

Deep-learning-based technologies such as deepfakes ones have been attracting widespread attention in both society and academia, particularly ones used to synthesize forged face images. These automatic and professional-skill-free face manipulation technologies can be used to replace the face in an original image or video with any target object while maintaining the expression and demeanor. Since human faces are closely related to identity characteristics, maliciously disseminated identity manipulated videos could trigger a crisis of public trust in the media and could even have serious political, social, and legal implications. To effectively detect manipulated videos, we focus on the position offset in the face blending process, resulting from the forced affine transformation of the normalized forged face. We introduce a method for detecting manipulated videos that is based on the trajectory of the facial region displacement. Specifically, we develop a virtual-anchor-based method for extracting the facial trajectory, which can robustly represent displacement information. This information was used to construct a network for exposing multidimensional artifacts in the trajectory sequences of manipulated videos that is based on dual-stream spatial-temporal graph attention and a gated recurrent unit backbone. Testing of our method on various manipulation datasets demonstrated that its accuracy and generalization ability is competitive with that of the leading detection methods.


Key findings
The proposed FTDN achieves high accuracy on various deepfake datasets, exceeding state-of-the-art methods in many cases. The method shows robustness to moderate video compression, but accuracy degrades with strong compression. The ablation study highlights the importance of the GRU encoder and the graph attention mechanisms in achieving high performance.
Approach
The approach extracts facial region displacement trajectories using a virtual-anchor method, which is more robust than traditional landmark-based or optical flow methods. A dual-stream spatial-temporal graph attention network with a GRU backbone is then used to classify these trajectories as real or fake, identifying spatial-temporal anomalies indicative of deepfakes.
Datasets
FaceForensics++ (FF++) dataset, including Deepfakes, FaceSwap, Face2Face, NeuralTextures, and FaceShifter datasets.
Model(s)
Fake Trajectory Detection Network (FTDN) which uses dual-stream spatial-temporal graph attention and a gated recurrent unit (GRU) backbone.
Author countries
Japan, China