DeepFake Detection Based on the Discrepancy Between the Face and its Context

Authors: Yuval Nirkin, Lior Wolf, Yosi Keller, Tal Hassner

Published: 2020-08-27 17:04:46+00:00

AI Summary

This paper proposes a deepfake detection method that leverages discrepancies between a face and its surrounding context in images. It uses two separate networks to identify the subject from the face region and its context, comparing the results to detect inconsistencies indicative of manipulation. This approach achieves state-of-the-art results on several benchmark datasets.

Abstract

We propose a method for detecting face swapping and other identity manipulations in single images. Face swapping methods, such as DeepFake, manipulate the face region, aiming to adjust the face to the appearance of its context, while leaving the context unchanged. We show that this modus operandi produces discrepancies between the two regions. These discrepancies offer exploitable telltale signs of manipulation. Our approach involves two networks: (i) a face identification network that considers the face region bounded by a tight semantic segmentation, and (ii) a context recognition network that considers the face context (e.g., hair, ears, neck). We describe a method which uses the recognition signals from our two networks to detect such discrepancies, providing a complementary detection signal that improves conventional real vs. fake classifiers commonly used for detecting fake images. Our method achieves state of the art results on the FaceForensics++, Celeb-DF-v2, and DFDC benchmarks for face manipulation detection, and even generalizes to detect fakes produced by unseen methods.


Key findings
The proposed method achieves state-of-the-art results on FaceForensics++, Celeb-DF-v2, and DFDC benchmarks. It also generalizes well to detect deepfakes created by unseen methods, outperforming existing approaches, particularly when artifacts are less apparent. The combination of face and context analysis proves effective in detecting manipulation.
Approach
The method uses two networks: one for face identification and another for context (hair, ears, neck) recognition. A discrepancy is calculated between the two networks' outputs, representing a new feature for a final classifier which distinguishes real from fake images. This discrepancy signal complements conventional real/fake classifiers.
Datasets
FaceForensics++, Celeb-DF-v2, DFDC, VGGFace2, LFW, and a custom dataset created using FaceForensics++ data and additional swapping techniques.
Model(s)
Xception network (for face and context recognition), U-Net (for face segmentation), and additional binary Xception networks for detecting specific manipulation types (face swapping and face reenactment). A final classifier (with a logistic loss function) combines the outputs.
Author countries
Israel, United States