Deepfake Detection with Optimized Hybrid Model: EAR Biometric Descriptor via Improved RCNN
Authors: Ruchika Sharma, Rudresh Dwivedi
Published: 2025-03-16 07:01:29+00:00
Comment: Submiited to journal
AI Summary
The paper proposes a novel deepfake detection method leveraging subtle ear movements and shape changes as biometric descriptors. It utilizes an enhanced RCNN for ear detection, followed by a hybrid DBN and Bi-GRU model, with weights optimized by a Self-Upgraded Jellyfish Optimization method and an improved score-level fusion for classification. This approach aims to provide robust deepfake detection across various challenging scenarios.
Abstract
Deepfake is a widely used technology employed in recent years to create pernicious content such as fake news, movies, and rumors by altering and substituting facial information from various sources. Given the ongoing evolution of deepfakes investigation of continuous identification and prevention is crucial. Due to recent technological advancements in AI (Artificial Intelligence) distinguishing deepfakes and artificially altered images has become challenging. This approach introduces the robust detection of subtle ear movements and shape changes to generate ear descriptors. Further, we also propose a novel optimized hybrid deepfake detection model that considers the ear biometric descriptors via enhanced RCNN (Region-Based Convolutional Neural Network). Initially, the input video is converted into frames and preprocessed through resizing, normalization, grayscale conversion, and filtering processes followed by face detection using the Viola-Jones technique. Next, a hybrid model comprising DBN (Deep Belief Network) and Bi-GRU (Bidirectional Gated Recurrent Unit) is utilized for deepfake detection based on ear descriptors. The output from the detection phase is determined through improved score-level fusion. To enhance the performance, the weights of both detection models are optimally tuned using the SU-JFO (Self-Upgraded Jellyfish Optimization method). Experimentation is conducted based on four scenarios: compression, noise, rotation, pose, and illumination on three different datasets. The performance results affirm that our proposed method outperforms traditional models such as CNN (Convolution Neural Network), SqueezeNet, LeNet, LinkNet, LSTM (Long Short-Term Memory), DFP (Deepfake Predictor) [1], and ResNext+CNN+LSTM [2] in terms of various performance metrics viz. accuracy, specificity, and precision.