Decoupling Forgery Semantics for Generalizable Deepfake Detection

View on arXiv ← Back to list

Authors: Wei Ye, Xinan He, Feng Ding

Published: 2024-06-14 06:00:14+00:00

AI Summary

This paper presents a novel deepfake detection method that improves generalization by decoupling forgery semantics. It extracts common forgery semantics from various deepfake techniques, using them to build a more robust detector, and employs an adaptive high-pass module and a two-stage training strategy to enhance the independence of decoupled semantics.

Abstract

In this paper, we propose a novel method for detecting DeepFakes, enhancing the generalization of detection through semantic decoupling. There are now multiple DeepFake forgery technologies that not only possess unique forgery semantics but may also share common forgery semantics. The unique forgery semantics and irrelevant content semantics may promote over-fitting and hamper generalization for DeepFake detectors. For our proposed method, after decoupling, the common forgery semantics could be extracted from DeepFakes, and subsequently be employed for developing the generalizability of DeepFake detectors. Also, to pursue additional generalizability, we designed an adaptive high-pass module and a two-stage training strategy to improve the independence of decoupled semantics. Evaluation on FF++, Celeb-DF, DFD, and DFDC datasets showcases our method's excellent detection and generalization performance. Code is available at: https://github.com/leaffeall/DFS-GDD.

Key findings

The proposed method achieves superior performance compared to existing methods in both intra-domain and cross-domain deepfake detection. Ablation studies validate the effectiveness of the proposed modules (adaptive high-pass filter, multi-scale feature extraction and fusion). Grad-CAM visualizations show that the method effectively focuses on common forgery cues, rather than irrelevant content, leading to better generalization.

Approach

The method uses a two-stage training process. The first stage decouples irrelevant content semantics from all forgery semantics using an encoder-decoder architecture. The second stage further disentangles common and unique forgery semantics, focusing on common semantics for detection. An adaptive high-pass filter is used to extract high-frequency features.

Datasets

FaceForensics++ (FF++), Celeb-DF, DeepfakeDetection (DFD), DFDC

Model(s)

SwiftFormer-L1, Xception, Custom encoder-decoder architecture with convolutional layers

Author countries

China

← Previous