Deepfake detection in videos with multiple faces using geometric-fakeness features

Authors: Kirill Vyshegorodtsev, Dmitry Kudiyarov, Alexander Balashov, Alexander Kuzmin

Published: 2024-10-10 13:10:34+00:00

AI Summary

This paper proposes a deepfake detection method using geometric-fakeness features (GFF) that combine temporal inconsistencies in per-frame deepfake scores with geometric characteristics of faces in a video. A deep learning model analyzes these GFFs to predict deepfakes, outperforming state-of-the-art methods on multiple benchmark datasets.

Abstract

Due to the development of facial manipulation techniques in recent years deepfake detection in video stream became an important problem for face biometrics, brand monitoring or online video conferencing solutions. In case of a biometric authentication, if you replace a real datastream with a deepfake, you can bypass a liveness detection system. Using a deepfake in a video conference, you can penetrate into a private meeting. Deepfakes of victims or public figures can also be used by fraudsters for blackmailing, extorsion and financial fraud. Therefore, the task of detecting deepfakes is relevant to ensuring privacy and security. In existing approaches to a deepfake detection their performance deteriorates when multiple faces are present in a video simultaneously or when there are other objects erroneously classified as faces. In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video and its per-frame deepfake scores. To analyze temporal inconsistencies in GFFs between the frames we train a complex deep learning model that outputs a final deepfake prediction. We employ our approach to analyze videos with multiple faces that are simultaneously present in a video. Such videos often occur in practice e.g., in an online video conference. In this case, real faces appearing in a frame together with a deepfake face will significantly affect a deepfake detection and our approach allows to counter this problem. Through extensive experiments we demonstrate that our approach outperforms current state-of-the-art methods on popular benchmark datasets such as FaceForensics++, DFDC, Celeb-DF and WildDeepFake. The proposed approach remains accurate when trained to detect multiple different deepfake generation techniques.


Key findings
The proposed GFF-based method outperforms existing state-of-the-art deepfake detection methods across multiple datasets. It demonstrates superior performance even when multiple faces are present simultaneously. The model also shows good generalizability to unseen deepfake generation techniques.
Approach
The approach uses EfficientNet-B4 to extract fakeness features from faces, FaceNet to group faces by person across frames, and calculates geometric features based on face area. These are combined into GFFs and fed into a CNN followed by a fully connected network to predict deepfakes.
Datasets
FaceForensics++, DFDC, Celeb-DF, DFD, WildDeepFake
Model(s)
EfficientNet-B4, FaceNet, CNN, Fully Connected Neural Network
Author countries
Russia