Cross-Domain Local Characteristic Enhanced Deepfake Video Detection

Authors: Zihan Liu, Hanyi Wang, Shilin Wang

Published: 2022-11-07 07:44:09+00:00

AI Summary

This paper proposes Cross-Domain Local Forensics (XDLF), a deepfake video detection pipeline that leverages local forgery patterns from space, frequency, and time domains. XDLF enhances subtle artifacts in forgery-sensitive facial regions, improving generalization to unseen manipulations.

Abstract

As ultra-realistic face forgery techniques emerge, deepfake detection has attracted increasing attention due to security concerns. Many detectors cannot achieve accurate results when detecting unseen manipulations despite excellent performance on known forgeries. In this paper, we are motivated by the observation that the discrepancies between real and fake videos are extremely subtle and localized, and inconsistencies or irregularities can exist in some critical facial regions across various information domains. To this end, we propose a novel pipeline, Cross-Domain Local Forensics (XDLF), for more general deepfake video detection. In the proposed pipeline, a specialized framework is presented to simultaneously exploit local forgery patterns from space, frequency, and time domains, thus learning cross-domain features to detect forgeries. Moreover, the framework leverages four high-level forgery-sensitive local regions of a human face to guide the model to enhance subtle artifacts and localize potential anomalies. Extensive experiments on several benchmark datasets demonstrate the impressive performance of our method, and we achieve superiority over several state-of-the-art methods on cross-dataset generalization. We also examined the factors that contribute to its performance through ablations, which suggests that exploiting cross-domain local characteristics is a noteworthy direction for developing more general deepfake detectors.


Key findings
XDLF outperforms several state-of-the-art methods on cross-dataset generalization, demonstrating its robustness to unseen forgeries. Ablation studies show the importance of exploiting cross-domain local characteristics and the effectiveness of the FSLR-Guided Feature Enhancement module.
Approach
XDLF uses a two-stream 3D CNN architecture to extract spatio-temporal features from RGB images and frequency maps. It incorporates forgery-sensitive local regions (FSLR) to enhance subtle artifacts and employs cross-attention and feature fusion to combine information from multiple domains.
Datasets
FaceForensics++, Celeb-DF, DeepFake Detection Challenge (DFDC)
Model(s)
3D ResNet-50
Author countries
China