Locate and Verify: A Two-Stream Network for Improved Deepfake Detection

View on arXiv ← Back to list

Authors: Chao Shuai, Jieming Zhong, Shuang Wu, Feng Lin, Zhibo Wang, Zhongjie Ba, Zhenguang Liu, Lorenzo Cavallaro, Kui Ren

Published: 2023-09-20 08:25:19+00:00

AI Summary

This paper introduces a two-stream network for deepfake detection that improves robustness and generalizability. The approach addresses overfitting by focusing on potential forgery regions and uses a semi-supervised strategy to estimate patch-level forgery locations, outperforming existing methods on six benchmark datasets.

Abstract

Deepfake has taken the world by storm, triggering a trust crisis. Current deepfake detection methods are typically inadequate in generalizability, with a tendency to overfit to image contents such as the background, which are frequently occurring but relatively unimportant in the training dataset. Furthermore, current methods heavily rely on a few dominant forgery regions and may ignore other equally important regions, leading to inadequate uncovering of forgery cues. In this paper, we strive to address these shortcomings from three aspects: (1) We propose an innovative two-stream network that effectively enlarges the potential regions from which the model extracts forgery evidence. (2) We devise three functional modules to handle the multi-stream and multi-scale features in a collaborative learning scheme. (3) Confronted with the challenge of obtaining forgery annotations, we propose a Semi-supervised Patch Similarity Learning strategy to estimate patch-level forged location annotations. Empirically, our method demonstrates significantly improved robustness and generalizability, outperforming previous methods on six benchmarks, and improving the frame-level AUC on Deepfake Detection Challenge preview dataset from 0.797 to 0.835 and video-level AUC on CelebDF$_$v1 dataset from 0.811 to 0.847. Our implementation is available at https://github.com/sccsok/Locate-and-Verify.

Key findings

The proposed method significantly outperforms previous state-of-the-art methods on six benchmark datasets. It achieves improved frame-level AUC on the Deepfake Detection Challenge preview dataset (0.835 vs 0.797) and video-level AUC on CelebDF_v1 (0.847 vs 0.811). The semi-supervised patch similarity learning strategy shows effectiveness in handling the lack of forgery location annotations.

Approach

The proposed method uses a two-stream network with a localization branch and a classification branch. The localization branch identifies potential forgery regions, guiding the classification branch to focus on these areas for detecting forgery cues. A semi-supervised patch similarity learning strategy is employed to handle the lack of forgery location annotations.

Datasets

FaceForensics++, CelebDF (v1 and v2), DeepFake Detection Challenge preview, DeepFakeDetection, DeeperForensics 1.0

Model(s)

Two-stream network with Xception backbone, Cross-modality Consistency Enhancement (CMCE) module, Local Forgery Guided Attention (LFGA) module, Multi-scale Patch Feature Fusion (MPFF) module

Author countries

China, Singapore, United Kingdom

← Previous