Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

View on arXiv ← Back to list

Authors: Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, Baoyuan Wu

Published: 2023-11-19 09:41:10+00:00

AI Summary

The paper proposes LSDA (Latent Space Data Augmentation), a deepfake detection method addressing the generalization problem by augmenting data in the latent space. This approach creates variations within and across forgery features, improving the model's ability to generalize to unseen deepfakes and outperforming state-of-the-art detectors.

Abstract

Deepfake detection faces a critical generalization hurdle, with performance deteriorating when there is a mismatch between the distributions of training and testing data. A broadly received explanation is the tendency of these detectors to be overfitted to forgery-specific artifacts, rather than learning features that are widely applicable across various forgeries. To address this issue, we propose a simple yet effective detector called LSDA (underline{L}atent underline{S}pace underline{D}ata underline{A}ugmentation), which is based on a heuristic idea: representations with a wider variety of forgeries should be able to learn a more generalizable decision boundary, thereby mitigating the overfitting of method-specific features (see Fig.~ref{fig:toy}). Following this idea, we propose to enlarge the forgery space by constructing and simulating variations within and across forgery features in the latent space. This approach encompasses the acquisition of enriched, domain-specific features and the facilitation of smoother transitions between different forgery types, effectively bridging domain gaps. Our approach culminates in refining a binary classifier that leverages the distilled knowledge from the enhanced features, striving for a generalizable deepfake detector. Comprehensive experiments show that our proposed method is surprisingly effective and transcends state-of-the-art detectors across several widely used benchmarks.

Key findings

LSDA significantly outperforms state-of-the-art deepfake detectors across multiple datasets in cross-dataset generalization. The method demonstrates robustness to unseen perturbations and better generalizability compared to RGB-based augmentation methods. Ablation studies confirmed the effectiveness of both within-domain and cross-domain augmentation strategies, as well as the use of ArcFace for real face feature learning.

Approach

LSDA augments deepfake data in the latent space using within-domain (Centrifugal, Affine, Additive transformations) and cross-domain (Mixup) techniques. A teacher-student architecture is employed, with teacher encoders learning domain-specific features and a student encoder learning a generalized decision boundary from augmented features. A binary classifier then distinguishes real from fake videos.

Datasets

FaceForensics++ (FF++) [c23 version], DeepfakeDetection (DFD), Deepfake Detection Challenge (DFDC), preview version of DFDC (DFDCP), and CelebDF (CDF)

Model(s)

EfficientNet-B4 (as default encoder for forgery features), ArcFace (for real face encoder), iResNet101 (alternative real encoder in ablation study)

Author countries

China, USA

← Previous