Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook

Authors: Florinel-Alin Croitoru, Andrei-Iulian Hiji, Vlad Hondru, Nicolae Catalin Ristea, Paul Irofti, Marius Popescu, Cristian Rusu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah

Published: 2024-11-29 08:29:25+00:00

AI Summary

This paper surveys deepfake generation and detection techniques, including recent advancements like diffusion models and Neural Radiance Fields. It also introduces a novel multimodal benchmark, BioDeepAV, to evaluate deepfake detectors on out-of-distribution content, revealing limitations in the generalization capabilities of state-of-the-art detectors.

Abstract

With the recent advancements in generative modeling, the realism of deepfake content has been increasing at a steady pace, even reaching the point where people often fail to detect manipulated media content online, thus being deceived into various kinds of scams. In this paper, we survey deepfake generation and detection techniques, including the most recent developments in the field, such as diffusion models and Neural Radiance Fields. Our literature review covers all deepfake media types, comprising image, video, audio and multimodal (audio-visual) content. We identify various kinds of deepfakes, according to the procedure used to alter or generate the fake content. We further construct a taxonomy of deepfake generation and detection methods, illustrating the important groups of methods and the domains where these methods are applied. Next, we gather datasets used for deepfake detection and provide updated rankings of the best performing deepfake detectors on the most popular datasets. In addition, we develop a novel multimodal benchmark to evaluate deepfake detectors on out-of-distribution content. The results indicate that state-of-the-art detectors fail to generalize to deepfake content generated by unseen deepfake generators. Finally, we propose future directions to obtain robust and powerful deepfake detectors. Our project page and new benchmark are available at https://github.com/CroitoruAlin/biodeep.


Key findings
State-of-the-art deepfake detectors exhibit poor generalization to deepfake content generated by unseen deepfake generators. The BioDeepAV benchmark highlights the significant performance drops of existing detectors on realistic deepfakes produced by newer generative models. This underscores the need for more robust and generalized deepfake detection methods.
Approach
The authors conduct a comprehensive survey of deepfake generation and detection methods across image, video, audio, and multimodal domains. They develop a novel multimodal benchmark, BioDeepAV, to evaluate the generalization capacity of existing deepfake detectors to out-of-distribution content generated by unseen deepfake generators.
Datasets
DFFD, DiffusionFace, ForgeryNet, FaceForensics++, DeeperForensics, Celeb-DF, WildDeepfake, DeepFake-TIMIT, UADFV, GenVideo, WaveFake, ASVspoof 2019-LA, ASVspoof 2021-LA, ASVspoof 2021-DF, In-the-Wild, ADD 2022, ADD 2023, FoR, MLAAD, FakeAVCeleb, LAV-DF, DFDC, HDTF, LAION-Face, LibriTTS, a dataset of English dialects, TalkingHead-1KH.
Model(s)
Various CNNs, Transformers, RNNs, GANs, VAEs, GCNs, NeRFs, and hybrid architectures are used for deepfake detection depending on the modality. Specific models mentioned include EfficientNet, XceptionNet, ResNet, StyleGAN, Stable Diffusion, SDXL, RawNet2, wav2vec 2.0, Swin Transformer.
Author countries
Romania, UAE, Sweden, USA