Two-branch Recurrent Network for Isolating Deepfakes in Videos

Authors: Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas, Shenoy Pratik Gurudatt, Wael AbdAlmageed

Published: 2020-08-08 01:38:56+00:00

AI Summary

This paper proposes a two-branch recurrent network for deepfake detection in videos. The network isolates manipulated faces by amplifying artifacts while suppressing high-level face content using a Laplacian of Gaussian filter and a novel loss function that compresses natural face variability and pushes away manipulated faces in the feature space. This approach shows promising results on several deepfake detection benchmarks.

Abstract

The current spike of hyper-realistic faces artificially generated using deepfakes calls for media forensics solutions that are tailored to video streams and work reliably with a low false alarm rate at the video level. We present a method for deepfake detection based on a two-branch network structure that isolates digitally manipulated faces by learning to amplify artifacts while suppressing the high-level face content. Unlike current methods that extract spatial frequencies as a preprocessing step, we propose a two-branch structure: one branch propagates the original information, while the other branch suppresses the face content yet amplifies multi-band frequencies using a Laplacian of Gaussian (LoG) as a bottleneck layer. To better isolate manipulated faces, we derive a novel cost function that, unlike regular classification, compresses the variability of natural faces and pushes away the unrealistic facial samples in the feature space. Our two novel components show promising results on the FaceForensics++, Celeb-DF, and Facebook's DFDC preview benchmarks, when compared to prior work. We then offer a full, detailed ablation study of our network architecture and cost function. Finally, although the bar is still high to get very remarkable figures at a very low false alarm rate, our study shows that we can achieve good video-level performance when cross-testing in terms of video-level AUC.


Key findings
The proposed method demonstrates promising results on various deepfake detection benchmarks, achieving good video-level performance in cross-dataset testing, particularly in terms of video-level AUC. The two novel components (two-branch architecture and novel loss function) contribute to this improved performance.
Approach
The authors employ a two-branch network: one branch processes the original video frames, while the other uses a Laplacian of Gaussian (LoG) filter to suppress high-level face content and amplify artifacts. A novel loss function compresses natural face representations and separates manipulated faces in the feature space. A bidirectional LSTM then models temporal information.
Datasets
FaceForensics++, Celeb-DF, and Facebook's DFDC preview benchmarks
Model(s)
Two-branch recurrent network with DenseBlocks, a Laplacian of Gaussian (LoG) filter as a bottleneck layer, and a bidirectional LSTM.
Author countries
USA