DF40: Toward Next-Generation Deepfake Detection

Authors: Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, Li Yuan

Published: 2024-06-19 12:35:02+00:00

AI Summary

The paper introduces DF40, a large-scale, diverse deepfake detection benchmark dataset containing 40 distinct deepfake techniques. It addresses limitations of existing datasets by incorporating diverse forgery methods and realistic deepfakes, enabling more comprehensive evaluation and revealing insightful findings about deepfake detection.

Abstract

We propose a new comprehensive benchmark to revolutionize the current deepfake detection field to the next generation. Predominantly, existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset (e.g., FF++) and testing them on other prevalent deepfake datasets. This protocol is often regarded as a golden compass for navigating SoTA detectors. But can these stand-out winners be truly applied to tackle the myriad of realistic and diverse deepfakes lurking in the real world? If not, what underlying factors contribute to this gap? In this work, we found the dataset (both train and test) can be the primary culprit due to: (1) forgery diversity: Deepfake techniques are commonly referred to as both face forgery and entire image synthesis. Most existing datasets only contain partial types of them, with limited forgery methods implemented; (2) forgery realism: The dominated training dataset, FF++, contains out-of-date forgery techniques from the past four years. Honing skills on these forgeries makes it difficult to guarantee effective detection generalization toward nowadays' SoTA deepfakes; (3) evaluation protocol: Most detection works perform evaluations on one type, which hinders the development of universal deepfake detectors. To address this dilemma, we construct a highly diverse deepfake detection dataset called DF40, which comprises 40 distinct deepfake techniques. We then conduct comprehensive evaluations using 4 standard evaluation protocols and 8 representative detection methods, resulting in over 2,000 evaluations. Through these evaluations, we provide an extensive analysis from various perspectives, leading to 7 new insightful findings. We also open up 4 valuable yet previously underexplored research questions to inspire future works. Our project page is https://github.com/YZY-stack/DF40.


Key findings
State-of-the-art deepfake detectors don't always significantly outperform baselines on DF40. CLIP models excel due to pre-training, highlighting the importance of pre-trained features. Both forgery methods and data domains greatly impact detection performance, revealing limitations in current models' generalization abilities.
Approach
DF40 tackles the issue of limited diversity and realism in existing deepfake datasets by creating a new benchmark with 40 distinct deepfake generation methods, covering various forgery types. It then performs extensive evaluations using multiple detection models and protocols to analyze detection performance and identify key findings and open research questions.
Datasets
FaceForensics++, Celeb-DF, UADFV, CelebA, FFHQ, VFHQ, GenImage
Model(s)
Xception, RECCE, SPSL, SRM, SBI, RFM, CLIP
Author countries
China