Deepfakes Detection with Automatic Face Weighting

Authors: Daniel Mas Montserrat, Hanxiang Hao, S. K. Yarlagadda, Sriram Baireddy, Ruiting Shao, János Horváth, Emily Bartusiak, Justin Yang, David Güera, Fengqing Zhu, Edward J. Delp

Published: 2020-04-25 00:47:42+00:00

AI Summary

This paper proposes a deepfake detection method using a CNN-RNN architecture that extracts visual and temporal features from faces in videos. The method incorporates automatic face weighting and a GRU to improve accuracy, achieving competitive results on the Deepfake Detection Challenge (DFDC) dataset.

Abstract

Altered and manipulated multimedia is increasingly present and widely distributed via social media platforms. Advanced video manipulation tools enable the generation of highly realistic-looking altered multimedia. While many methods have been presented to detect manipulations, most of them fail when evaluated with data outside of the datasets used in research environments. In order to address this problem, the Deepfake Detection Challenge (DFDC) provides a large dataset of videos containing realistic manipulations and an evaluation system that ensures that methods work quickly and accurately, even when faced with challenging data. In this paper, we introduce a method based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs) that extracts visual and temporal features from faces present in videos to accurately detect manipulations. The method is evaluated with the DFDC dataset, providing competitive results compared to other techniques.


Key findings
The proposed method achieved a high balanced accuracy (91.88%) on the DFDC test set. The combination of AFW and GRU improved accuracy over using only EfficientNet. Using a boosting network and test-time augmentation further reduced the log-likelihood error to 0.321, placing it in the top 5% of the DFDC competition.
Approach
The approach uses MTCNN for face detection, EfficientNet-b5 for feature extraction from detected faces, and an Automatic Face Weighting (AFW) layer combined with a GRU to generate a video-level prediction of authenticity. A boosting network and test-time augmentation further enhance the robustness of the predictions.
Datasets
Deepfake Detection Challenge (DFDC) dataset
Model(s)
MTCNN, EfficientNet-b5, Gated Recurrent Unit (GRU)
Author countries
USA