Deepfake Detection Via Facial Feature Extraction and Modeling

Authors: Benjamin Carter, Nathan Dilla, Micheal Callahan, Atuhaire Ambala

Published: 2025-07-24 21:30:51+00:00

AI Summary

This paper proposes a novel deepfake detection approach using only facial landmark data extracted from videos. Instead of directly processing raw video images, the method focuses on identifying subtle inconsistencies in facial movements. This approach achieves promising accuracy across various neural network models.

Abstract

The rise of deepfake technology brings forth new questions about the authenticity of various forms of media found online today. Videos and images generated by artificial intelligence (AI) have become increasingly more difficult to differentiate from genuine media, resulting in the need for new models to detect artificially-generated media. While many models have attempted to solve this, most focus on direct image processing, adapting a convolutional neural network (CNN) or a recurrent neural network (RNN) that directly interacts with the video image data. This paper introduces an approach of using solely facial landmarks for deepfake detection. Using a dataset consisting of both deepfake and genuine videos of human faces, this paper describes an approach for extracting facial landmarks for deepfake detection, focusing on identifying subtle inconsistencies in facial movements instead of raw image processing. Experimental results demonstrated that this feature extraction technique is effective in various neural network models, with the same facial landmarks tested on three neural network models, with promising performance metrics indicating its potential for real-world applications. The findings discussed in this paper include RNN and artificial neural network (ANN) models with accuracy between 96% and 93%, respectively, with a CNN model hovering around 78%. This research challenges the assumption that raw image processing is necessary to identify deepfake videos by presenting a facial feature extraction approach compatible with various neural network models while requiring fewer parameters.


Key findings
The RNN model achieved the highest accuracy (96%), followed by the ANN (93%) and CNN (77%). The results demonstrate the effectiveness of using facial landmark data for deepfake detection, achieving high accuracy while potentially reducing computational cost compared to raw image processing methods. The study highlights that the feature extraction approach is compatible with different neural network architectures.
Approach
The approach extracts 68 facial landmarks from each frame of a video using DLib. These landmarks, along with their first, second, and third differentials, are then fed into RNN, CNN, and ANN models for deepfake classification. The models learn to identify inconsistencies in facial movement patterns to detect deepfakes.
Datasets
A Kaggle dataset containing both deepfake and genuine videos of human faces. The dataset was preprocessed to ensure uniform video lengths (720 frames) after feature extraction.
Model(s)
Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), Artificial Neural Network (ANN). The RNN utilized LSTM layers, and the CNN processed image-like structures created from the landmark data.
Author countries
UNKNOWN