Deepfake Detection with Optimized Hybrid Model: EAR Biometric Descriptor via Improved RCNN

Authors: Ruchika Sharma, Rudresh Dwivedi

Published: 2025-03-16 07:01:29+00:00

Comment: Submiited to journal

AI Summary

The paper proposes a novel deepfake detection method leveraging subtle ear movements and shape changes as biometric descriptors. It utilizes an enhanced RCNN for ear detection, followed by a hybrid DBN and Bi-GRU model, with weights optimized by a Self-Upgraded Jellyfish Optimization method and an improved score-level fusion for classification. This approach aims to provide robust deepfake detection across various challenging scenarios.

Abstract

Deepfake is a widely used technology employed in recent years to create pernicious content such as fake news, movies, and rumors by altering and substituting facial information from various sources. Given the ongoing evolution of deepfakes investigation of continuous identification and prevention is crucial. Due to recent technological advancements in AI (Artificial Intelligence) distinguishing deepfakes and artificially altered images has become challenging. This approach introduces the robust detection of subtle ear movements and shape changes to generate ear descriptors. Further, we also propose a novel optimized hybrid deepfake detection model that considers the ear biometric descriptors via enhanced RCNN (Region-Based Convolutional Neural Network). Initially, the input video is converted into frames and preprocessed through resizing, normalization, grayscale conversion, and filtering processes followed by face detection using the Viola-Jones technique. Next, a hybrid model comprising DBN (Deep Belief Network) and Bi-GRU (Bidirectional Gated Recurrent Unit) is utilized for deepfake detection based on ear descriptors. The output from the detection phase is determined through improved score-level fusion. To enhance the performance, the weights of both detection models are optimally tuned using the SU-JFO (Self-Upgraded Jellyfish Optimization method). Experimentation is conducted based on four scenarios: compression, noise, rotation, pose, and illumination on three different datasets. The performance results affirm that our proposed method outperforms traditional models such as CNN (Convolution Neural Network), SqueezeNet, LeNet, LinkNet, LSTM (Long Short-Term Memory), DFP (Deepfake Predictor) [1], and ResNext+CNN+LSTM [2] in terms of various performance metrics viz. accuracy, specificity, and precision.


Key findings
The proposed SU-JFO optimized hybrid model consistently outperforms traditional deepfake detection methods (CNN, SqueezeNet, LeNet, LSTM, LinkNet, DFP, ResNext+CNN+LSTM) across all tested scenarios (compression, noise, rotation, pose, illumination) and datasets. It achieves significantly higher accuracy, precision, F-measure, and MCC, demonstrating robust detection of deepfakes. Notably, it achieved an F-measure of 0.928 at 70% training data, surpassing compared models.
Approach
The method converts input videos into frames, preprocesses them, and detects faces using Viola-Jones. It then employs an improved RCNN to detect ears and extract biometric descriptors (size, shape, and AAM features). These features are fed into a hybrid deep learning model consisting of DBN and Bi-GRU, whose weights are optimized by the Self-Upgraded Jellyfish Optimization (SU-JFO) algorithm, and the final detection is determined via an improved score-level fusion.
Datasets
WLDR Dataset (Dataset1), DeepfakeTIMIT Dataset (Dataset2), Celeb-DF Dataset (Dataset3)
Model(s)
Deep Belief Network (DBN), Bidirectional Gated Recurrent Unit (Bi-GRU), Improved Region-Based Convolutional Neural Network (RCNN) with ResNet50 backbone, Viola-Jones algorithm, Self-Upgraded Jellyfish Optimization (SU-JFO).
Author countries
India