ID-Reveal: Identity-aware DeepFake Video Detection

Authors: Davide Cozzolino, Andreas Rössler, Justus Thies, Matthias Nießner, Luisa Verdoliva

Published: 2020-12-04 10:43:16+00:00

AI Summary

ID-Reveal is a deepfake detection method that learns temporal facial features specific to how a person moves while talking, using metric learning and adversarial training. It doesn't require training data of fakes, only real videos, and is robust to post-processing effects, showing improved generalization and robustness to low-quality videos compared to state-of-the-art methods.

Abstract

A major challenge in DeepFake forgery detection is that state-of-the-art algorithms are mostly trained to detect a specific fake method. As a result, these approaches show poor generalization across different types of facial manipulations, e.g., from face swapping to facial reenactment. To this end, we introduce ID-Reveal, a new approach that learns temporal facial features, specific of how a person moves while talking, by means of metric learning coupled with an adversarial training strategy. The advantage is that we do not need any training data of fakes, but only train on real videos. Moreover, we utilize high-level semantic features, which enables robustness to widespread and disruptive forms of post-processing. We perform a thorough experimental analysis on several publicly available benchmarks. Compared to state of the art, our method improves generalization and is more robust to low-quality videos, that are usually spread over social networks. In particular, we obtain an average improvement of more than 15% in terms of accuracy for facial reenactment on high compressed videos.


Key findings
ID-Reveal significantly outperforms state-of-the-art methods, achieving more than a 15% average accuracy improvement on facial reenactment in high-compression videos. The method shows improved generalization across different manipulation types and robustness to low-quality videos commonly found on social media. The approach successfully leverages identity-specific features to improve deepfake detection.
Approach
ID-Reveal uses metric learning to learn temporal facial features from real videos only. It trains a Temporal ID Network to embed features and a 3DMM Generative Network adversarially to ensure focus on temporal behavior. During testing, it compares a test video's embedding to a reference video's embedding to detect inconsistencies.
Datasets
UNKNOWN, but the abstract mentions several publicly available benchmarks and uses a large dataset of real videos for training. The paper references DFDC and FaceForensics++ datasets in the experimental analysis section.
Model(s)
Temporal ID Network, 3DMM Generative Network, 3D Morphable Model (3DMM) for feature extraction.
Author countries
Italy, Germany