DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection

View on arXiv ← Back to list

Authors: Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, Baoyuan Wu

Published: 2023-07-04 01:34:41+00:00

AI Summary

DeepfakeBench is the first comprehensive benchmark for deepfake detection, addressing the lack of standardization in data processing, experimental settings, and evaluation metrics. It offers a unified data management system, an integrated framework for state-of-the-art methods, and standardized evaluation protocols to promote transparency and reproducibility.

Abstract

A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. This issue leads to unfair performance comparisons and potentially misleading results. Specifically, there is a lack of uniformity in data processing pipelines, resulting in inconsistent data inputs for detection models. Additionally, there are noticeable differences in experimental settings, and evaluation strategies and metrics lack standardization. To fill this gap, we present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions: 1) a unified data management system to ensure consistent input across all detectors, 2) an integrated framework for state-of-the-art methods implementation, and 3) standardized evaluation metrics and protocols to promote transparency and reproducibility. Featuring an extensible, modular-based codebase, DeepfakeBench contains 15 state-of-the-art detection methods, 9 deepfake datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations. Moreover, we provide new insights based on extensive analysis of these evaluations from various perspectives (e.g., data augmentations, backbones). We hope that our efforts could facilitate future research and foster innovation in this increasingly critical domain. All codes, evaluations, and analyses of our benchmark are publicly available at https://github.com/SCLBD/DeepfakeBench.

Key findings

Naive detectors (like Xception and EfficientNet-B4) performed surprisingly well compared to more complex methods under standardized conditions. Data augmentation significantly impacted detector performance, highlighting the need for further research in this area. The choice of backbone architecture also played a crucial role in detection performance.

Approach

DeepfakeBench provides a unified platform for deepfake detection by standardizing data processing pipelines, implementing 15 state-of-the-art detection methods within an integrated framework, and using standardized evaluation metrics and protocols. It also offers analysis tools to study factors influencing detection performance.

Datasets

FaceForensics++, CelebDF-v1, CelebDF-v2, DeepFakeDetection, DeepFake Detection Challenge Preview, DeepFake Detection Challenge, UADFV, FaceShifter, DeeperForensics-1.0

Model(s)

MesoNet, MesoInception, CNN-Aug (ResNet), EfficientNet-B4, Xception, Capsule, DSP-FWA (Xception), Face X-ray (HRNet), FFD (Xception), CORE (Xception), RECCE, UCF (Xception), F3Net (Xception), SPSL (Xception), SRM (Xception)

Author countries

China, China, China, USA

← Previous