Deepfake Detection using ImageNet models and Temporal Images of 468 Facial Landmarks

Authors: Christeen T Jose

Published: 2022-08-15 03:32:28+00:00

AI Summary

This paper proposes a novel deepfake detection method using temporal images. Temporal images represent the temporal movement of 468 facial landmarks across video frames as spatial relationships, enabling the use of CNNs (trained on ImageNet) for detection.

Abstract

This paper presents our results and findings on the use of temporal images for deepfake detection. We modelled temporal relations that exist in the movement of 468 facial landmarks across frames of a given video as spatial relations by constructing an image (referred to as temporal image) using the pixel values at these facial landmarks. CNNs are capable of recognizing spatial relationships that exist between the pixels of a given image. 10 different ImageNet models were considered for the study.


Key findings
MobileNetV2 achieved the best performance with a test accuracy of 0.97899. Transfer learning significantly improved model performance. VGG16 and VGG19 underperformed, suggesting limitations in their ability to detect patterns in the constructed temporal images.
Approach
The approach converts temporal relationships in facial landmark movements into spatial relationships within a 'temporal image'. This image is then fed into pre-trained ImageNet CNN models for deepfake classification.
Datasets
UADFV, Celeb-DF, FaceForensics, DFD (Google), Celeb-DF-v2, FaceForensics++ (primarily UADFV due to computational limitations)
Model(s)
MobileNet, MobileNetV2, Xception, InceptionResNetV2, InceptionV3, DenseNet121, EfficientNetB0, ResNet50, VGG16, VGG19
Author countries
India