DeepFakes: a New Threat to Face Recognition? Assessment and Detection

Authors: Pavel Korshunov, Sebastien Marcel

Published: 2018-12-20 16:36:39+00:00

AI Summary

This paper introduces the first publicly available dataset of Deepfake videos generated using GANs from the VidTIMIT database, assessing the vulnerability of state-of-the-art face recognition systems to these videos and evaluating several Deepfake detection methods.

Abstract

It is becoming increasingly easy to automatically replace a face of one person in a video with the face of another person by using a pre-trained generative adversarial network (GAN). Recent public scandals, e.g., the faces of celebrities being swapped onto pornographic videos, call for automated ways to detect these Deepfake videos. To help developing such methods, in this paper, we present the first publicly available set of Deepfake videos generated from videos of VidTIMIT database. We used open source software based on GANs to create the Deepfakes, and we emphasize that training and blending parameters can significantly impact the quality of the resulted videos. To demonstrate this impact, we generated videos with low and high visual quality (320 videos each) using differently tuned parameter sets. We showed that the state of the art face recognition systems based on VGG and Facenet neural networks are vulnerable to Deepfake videos, with 85.62% and 95.00% false acceptance rates respectively, which means methods for detecting Deepfake videos are necessary. By considering several baseline approaches, we found that audio-visual approach based on lip-sync inconsistency detection was not able to distinguish Deepfake videos. The best performing method, which is based on visual quality metrics and is often used in presentation attack detection domain, resulted in 8.97% equal error rate on high quality Deepfakes. Our experiments demonstrate that GAN-generated Deepfake videos are challenging for both face recognition systems and existing detection methods, and the further development of face swapping technology will make it even more so.


Key findings
State-of-the-art face recognition systems showed high false acceptance rates (up to 95%) when presented with high-quality Deepfakes. Lip-sync based detection failed; image quality metric methods with SVM achieved the best performance (8.97% EER on high-quality Deepfakes). The findings highlight the challenge of detecting high-quality Deepfakes.
Approach
The authors generated Deepfake videos using open-source GAN-based software with varying parameters to create low and high-quality versions. They evaluated the vulnerability of VGG and Facenet face recognition systems and tested several detection methods, including lip-sync inconsistency and image quality metrics.
Datasets
VidTIMIT database; A new dataset of 620 Deepfake videos (320 low-quality, 320 high-quality) generated by the authors was created and made publicly available.
Model(s)
VGG, Facenet, LSTM (for lip-sync detection), PCA, LDA, SVM (used with image quality metrics).
Author countries
Switzerland