Face Deepfakes -- A Comprehensive Review

Authors: Tharindu Fernando, Darshana Priyasad, Sridha Sridharan, Arun Ross, Clinton Fookes

Published: 2025-02-13 23:08:05+00:00

AI Summary

This survey paper provides a thorough theoretical analysis of state-of-the-art face deepfake generation and detection methods, systematically evaluating their implications on face biometric recognition. It also outlines key applications and research gaps, proposing future research directions.

Abstract

In recent years, remarkable advancements in deep-fake generation technology have led to unprecedented leaps in its realism and capabilities. Despite these advances, we observe a notable lack of structured and deep analysis deepfake technology. The principal aim of this survey is to contribute a thorough theoretical analysis of state-of-the-art face deepfake generation and detection methods. Furthermore, we provide a coherent and systematic evaluation of the implications of deepfakes on face biometric recognition approaches. In addition, we outline key applications of face deepfake technology, elucidating both positive and negative applications of the technology, provide a detailed discussion regarding the gaps in existing research, and propose key research directions for further investigation.


Key findings
State-of-the-art deepfake generation methods, particularly Wav2Lip, SimSwap, and the First-order model, effectively fool biometric recognition systems, especially lightweight ones. There's currently no universal deepfake detection method robust against all generation techniques and datasets. The review highlights the significant need for research into universal detection, identity recovery, explainable methods, standardized evaluation, and regulatory frameworks.
Approach
The paper conducts a comprehensive review of existing literature on face deepfake generation and detection, focusing on algorithmic details like architectures, training paradigms, loss functions, and evaluation metrics. It analyzes both video and audio deepfakes and their impact on biometric systems.
Datasets
VoxCeleb2, FaceForensics++, CelebA, CREMA-D, DF-TIMIT, VidTIMIT, UADFV, Deeperforensics-1.0, DFD, a self-built dataset; various datasets mentioned but not centrally used for the main contribution.
Model(s)
MesoNet, Capsule Network, ForensicTransfer, ResNet-18, XceptionNet, MobileNet, ResNet101, InceptionV3, DensNet121, InceptionReseNetV2, DenseNet169, LSTM, 3DCNN, Interpretable Spatial-Temporal Video Transformer (ISTVT), Convolutional Vision-Transformer (CVT), Spatial Relation Graph Unit (SRGU), VAE, GANs (various including CycleGAN, Recurrent GANs, Multimodal GANs), Diffusion Models, Autoencoders, Face2Face, ReenactGAN, GANimation, First Order Motion Model, Talking Heads, FC-TFG, Multimodal Talking Faces, EmoGen, AVFR-GAN, PNCC GAN, SimSwap, FSGAN, FS-GANv2, Faceshifter, HiRFS, MegaFS, FaceDancer; various models mentioned in related work but not used in their main contribution.
Author countries
Australia, United States