Detection of fake faces in videos

Authors: M. Shamanth, Russel Mathias, Dr Vijayalakshmi MN

Published: 2022-01-28 11:29:07+00:00

AI Summary

This paper proposes a deepfake video detection system using a pretrained BlazeFace model for face detection and an ensemble of ResNet and Xception neural networks for deepfake classification. The model, optimized using focal loss, achieved a k-folded accuracy of around 91% on a specific dataset.

Abstract

: Deep learning methodologies have been used to create applications that can cause threats to privacy, democracy and national security and could be used to further amplify malicious activities. One of those deep learning-powered applications in recent times is synthesized videos of famous personalities. According to Forbes, Generative Adversarial Networks(GANs) generated fake videos growing exponentially every year and the organization known as Deeptrace had estimated an increase of deepfakes by 84% from the year 2018 to 2019. They are used to generate and modify human faces, where most of the existing fake videos are of prurient non-consensual nature, of which its estimates to be around 96% and some carried out impersonating personalities for cyber crime. In this paper, available video datasets are identified and a pretrained model BlazeFace is used to detect faces, and a ResNet and Xception ensembled architectured neural network trained on the dataset to achieve the goal of detection of fake faces in videos. The model is optimized over a loss value and log loss values and evaluated over its F1 score. Over a sample of data, it is observed that focal loss provides better accuracy, F1 score and loss as the gamma of the focal loss becomes a hyper parameter. This provides a k-folded accuracy of around 91% at its peak in a training cycle with the real world accuracy subjected to change over time as the model decays.


Key findings
The ensembled ResNet and Xception model, trained with focal loss, achieved a peak k-folded accuracy of approximately 91%. The study notes that model performance is subject to decay over time and that hyperparameter tuning (e.g., focal loss gamma) influences accuracy. Overfitting was observed with increased training epochs and data, indicating the need for careful hyperparameter optimization.
Approach
The approach uses BlazeFace for face detection in video frames. A combined ResNet and Xception neural network is then trained on extracted face features to classify the frames as real or fake, using focal loss for optimization and evaluating performance using F1 score.
Datasets
UNKNOWN
Model(s)
BlazeFace (for face detection), ResNet, Xception (ensembled for deepfake classification)
Author countries
India