Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces

Authors: Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

Published: 2023-08-19 06:18:11+00:00

AI Summary

Recap is a novel deepfake detection model that addresses the limitations of existing methods which rely on specific forgery indicators. It achieves this by recovering faces from masked regions and then mapping these recovered faces, amplifying inconsistencies between real and fake videos.

Abstract

The exploitation of Deepfake techniques for malicious intentions has driven significant research interest in Deepfake detection. Deepfake manipulations frequently introduce random tampered traces, leading to unpredictable outcomes in different facial regions. However, existing detection methods heavily rely on specific forgery indicators, and as the forgery mode improves, these traces become increasingly randomized, resulting in a decline in the detection performance of methods reliant on specific forgery traces. To address the limitation, we propose Recap, a novel Deepfake detection model that exposes unspecific facial part inconsistencies by recovering faces and enlarges the differences between real and fake by mapping recovered faces. In the recovering stage, the model focuses on randomly masking regions of interest (ROIs) and reconstructing real faces without unpredictable tampered traces, resulting in a relatively good recovery effect for real faces while a poor recovery effect for fake faces. In the mapping stage, the output of the recovery phase serves as supervision to guide the facial mapping process. This mapping process strategically emphasizes the mapping of fake faces with poor recovery, leading to a further deterioration in their representation, while enhancing and refining the mapping of real faces with good representation. As a result, this approach significantly amplifies the discrepancies between real and fake videos. Our extensive experiments on standard benchmarks demonstrate that Recap is effective in multiple scenarios.


Key findings
Recap demonstrates high accuracy in deepfake detection across various datasets, outperforming several state-of-the-art methods. Its two-stage approach proves effective in amplifying inconsistencies, and the meta-learning strategy enhances generalization to unseen deepfake generation techniques. The ablation study validates the effectiveness of the proposed masking strategy and the two-stage architecture.
Approach
Recap uses a two-stage approach. The first stage, "Recovering," uses a masked autoencoder to reconstruct faces, highlighting inconsistencies in deepfakes. The second stage, "Mapping," maps the recovered faces, further amplifying differences between real and fake videos using meta-learning.
Datasets
FaceForensics++, Celeb-DF, WildDeepfake, DFDC
Model(s)
Masked autoencoder (asymmetric encoder-decoder architecture based on Vision Transformers), ResNet-18
Author countries
China, Singapore