Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

Authors: Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

Published: 2024-01-24 15:14:05+00:00

AI Summary

Delocate is a two-stage deepfake detection model that addresses the limitations of existing methods in detecting unknown domain deepfakes and accurately localizing tampered regions. It achieves this by first recovering masked regions of real faces to learn consistent facial features and then localizing tampered regions in fake faces by leveraging the discrepancies in reconstruction quality.

Abstract

Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areasthat vary between frames. Consequently, existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address thislimitation, we propose Delocate, a novel Deepfake detection model that can both recognize andlocalize unknown domain Deepfake videos. Ourmethod consists of two stages named recoveringand localization. In the recovering stage, the modelrandomly masks regions of interest (ROIs) and reconstructs real faces without tampering traces, leading to a relatively good recovery effect for realfaces and a poor recovery effect for fake faces. Inthe localization stage, the output of the recoveryphase and the forgery ground truth mask serve assupervision to guide the forgery localization process. This process strategically emphasizes the recovery phase of fake faces with poor recovery, facilitating the localization of tampered regions. Ourextensive experiments on four widely used benchmark datasets demonstrate that Delocate not onlyexcels in localizing tampered areas but also enhances cross-domain detection performance.


Key findings
Delocate excels at localizing tampered areas and improves cross-domain detection performance compared to existing methods across multiple datasets. It achieves higher AUC and lower EER scores in various unknown domain detection scenarios, indicating robustness to unseen forgery patterns. The ablation study shows that both the recovering and localization stages contribute significantly to the overall performance.
Approach
Delocate uses a two-stage approach. The first stage (recovering) uses a masked autoencoder trained on real faces to learn consistent facial features; fake faces reconstruct poorly. The second stage (localization) leverages the reconstruction differences and ground truth masks to pinpoint tampered regions, enhancing cross-domain detection.
Datasets
FaceForensics++, Celeb-DF, DeeperForensics-1.0, DFDC
Model(s)
Masked autoencoder (asymmetric encoder-decoder architecture with Vision Transformers and joint space-time attention), ResNet-18 (first three residual blocks), SENet, UNet, SCSE Module
Author countries
China, Singapore