Cost Sensitive Optimization of Deepfake Detector

Authors: Ivan Kukanov, Janne Karttunen, Hannu Sillanpää, Ville Hautamäki

Published: 2020-12-08 04:06:02+00:00

AI Summary

This paper proposes a cost-sensitive optimization approach for deepfake detection, arguing that the task should be treated as a screening problem with imbalanced classes. The authors introduce a method that fine-tunes deepfake detection models using the Maximal Figure-of-Merit (MFoM) framework to directly optimize cost-sensitive metrics like Detection Cost Function (DCF).

Abstract

Since the invention of cinema, the manipulated videos have existed. But generating manipulated videos that can fool the viewer has been a time-consuming endeavor. With the dramatic improvements in the deep generative modeling, generating believable looking fake videos has become a reality. In the present work, we concentrate on the so-called deepfake videos, where the source face is swapped with the targets. We argue that deepfake detection task should be viewed as a screening task, where the user, such as the video streaming platform, will screen a large number of videos daily. It is clear then that only a small fraction of the uploaded videos are deepfakes, so the detection performance needs to be measured in a cost-sensitive way. Preferably, the model parameters also need to be estimated in the same way. This is precisely what we propose here.


Key findings
Fine-tuning the CNN model with MFoM improved the Equal Error Rate (EER) from 8.07% to 6.03% on the test set. However, performance degraded significantly (to over 30% EER) on a self-collected evaluation set of high-quality deepfakes from YouTube, highlighting the challenge of detecting unseen deepfake generation methods.
Approach
The authors address the class imbalance in deepfake detection by using a cost-sensitive approach. They employ the Maximal Figure-of-Merit (MFoM) framework to fine-tune Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs), directly optimizing the Detection Cost Function (DCF) and Equal Error Rate (EER) metrics.
Datasets
FaceForensics++, Deepfake-TIMIT, a self-collected YouTube dataset of deepfakes.
Model(s)
CNN (based on MobileNet), LSTM (with InceptionV3 for feature extraction)
Author countries
Finland, Singapore