Detecting Deepfakes with Metric Learning

Authors: Akash Kumar, Arnav Bhavsar

Published: 2020-03-19 09:44:23+00:00

AI Summary

This paper proposes a deepfake detection approach using metric learning with a triplet network architecture. The method enhances the feature space distance between real and fake video embeddings, achieving state-of-the-art results on Celeb-DF and compressed Neural Texture datasets.

Abstract

With the arrival of several face-swapping applications such as FaceApp, SnapChat, MixBooth, FaceBlender and many more, the authenticity of digital media content is hanging on a very loose thread. On social media platforms, videos are widely circulated often at a high compression factor. In this work, we analyze several deep learning approaches in the context of deepfakes classification in high compression scenario and demonstrate that a proposed approach based on metric learning can be very effective in performing such a classification. Using less number of frames per video to assess its realism, the metric learning approach using a triplet network architecture proves to be fruitful. It learns to enhance the feature space distance between the cluster of real and fake videos embedding vectors. We validated our approaches on two datasets to analyze the behavior in different environments. We achieved a state-of-the-art AUC score of 99.2% on the Celeb-DF dataset and accuracy of 90.71% on a highly compressed Neural Texture dataset. Our approach is especially helpful on social media platforms where data compression is inevitable.


Key findings
The proposed metric learning approach achieved a state-of-the-art AUC score of 99.2% on Celeb-DF and an accuracy of 90.71% on highly compressed Neural Texture videos in the FF++ dataset. The method outperforms existing approaches, particularly in low-resolution videos, demonstrating its effectiveness in real-world social media scenarios where compression is common.
Approach
The authors utilize a triplet network for metric learning. This network learns to separate embeddings of real and fake videos in the feature space, improving classification accuracy, particularly in high compression scenarios. Face embeddings are generated using FaceNet, and a triplet loss function is applied.
Datasets
Celeb-DF and FaceForensics++ (FF++)
Model(s)
Xception, FaceNet, LSTM, 3D Convolutional Neural Network, Triplet Network, Random Forest, Stochastic Gradient Descent
Author countries
India