Deepfake Detection with Optimized Hybrid Model: EAR Biometric Descriptor via Improved RCNN

Authors: Ruchika Sharma, Rudresh Dwivedi

Published: 2025-03-16 07:01:29+00:00

AI Summary

This paper proposes a novel deepfake detection model that uses ear biometric descriptors extracted via an improved RCNN, combined with a hybrid model of DBN and Bi-GRU, and optimized using the SU-JFO algorithm. The model outperforms existing methods in accuracy, specificity, and precision across various adversarial attacks.

Abstract

Deepfake is a widely used technology employed in recent years to create pernicious content such as fake news, movies, and rumors by altering and substituting facial information from various sources. Given the ongoing evolution of deepfakes investigation of continuous identification and prevention is crucial. Due to recent technological advancements in AI (Artificial Intelligence) distinguishing deepfakes and artificially altered images has become challenging. This approach introduces the robust detection of subtle ear movements and shape changes to generate ear descriptors. Further, we also propose a novel optimized hybrid deepfake detection model that considers the ear biometric descriptors via enhanced RCNN (Region-Based Convolutional Neural Network). Initially, the input video is converted into frames and preprocessed through resizing, normalization, grayscale conversion, and filtering processes followed by face detection using the Viola-Jones technique. Next, a hybrid model comprising DBN (Deep Belief Network) and Bi-GRU (Bidirectional Gated Recurrent Unit) is utilized for deepfake detection based on ear descriptors. The output from the detection phase is determined through improved score-level fusion. To enhance the performance, the weights of both detection models are optimally tuned using the SU-JFO (Self-Upgraded Jellyfish Optimization method). Experimentation is conducted based on four scenarios: compression, noise, rotation, pose, and illumination on three different datasets. The performance results affirm that our proposed method outperforms traditional models such as CNN (Convolution Neural Network), SqueezeNet, LeNet, LinkNet, LSTM (Long Short-Term Memory), DFP (Deepfake Predictor) [1], and ResNext+CNN+LSTM [2] in terms of various performance metrics viz. accuracy, specificity, and precision.


Key findings
The proposed method outperforms traditional models like CNN, SqueezeNet, LeNet, LinkNet, LSTM, DFP, and ResNext+CNN+LSTM in accuracy, specificity, and precision across various scenarios (compression, noise, rotation, pose, and illumination). The SU-JFO algorithm effectively optimizes the hybrid model's weights, enhancing its performance.
Approach
The approach uses a three-stage process: preprocessing (resizing, normalization, grayscale conversion, filtering, and face detection), feature extraction (improved RCNN-based ear detection, ear attributes, and AAM features), and detection (a hybrid DBN and Bi-GRU model with weights optimized by SU-JFO and improved score-level fusion).
Datasets
WLDR Dataset, DeepfakeTIMIT Dataset, Celeb-DF Dataset
Model(s)
Improved RCNN, Deep Belief Network (DBN), Bidirectional Gated Recurrent Unit (Bi-GRU)
Author countries
India