Unmasking Deep Fakes: Leveraging Deep Learning for Video Authenticity Detection

View on arXiv ← Back to list

Authors: Mahmudul Hasan, Sadia Ruhama, Sabrina Tajnim Sithi, Chowdhury Mohammad Mutamir Samit, Oindrila Saha

Published: 2025-05-10 06:19:14+00:00

AI Summary

This paper proposes a deepfake video detection model using MTCNN for face detection and EfficientNet-B5 for feature extraction and classification. The model achieves high accuracy on the DFDC dataset, demonstrating the effectiveness of this hybrid approach.

Abstract

Deepfake videos, produced through advanced artificial intelligence methods now a days, pose a new challenge to the truthfulness of the digital media. As Deepfake becomes more convincing day by day, detecting them requires advanced methods capable of identifying subtle inconsistencies. The primary motivation of this paper is to recognize deepfake videos using deep learning techniques, specifically by using convolutional neural networks. Deep learning excels in pattern recognition, hence, makes it an ideal approach for detecting the intricate manipulations in deepfakes. In this paper, we consider using MTCNN as a face detector and EfficientNet-B5 as encoder model to predict if a video is deepfake or not. We utilize training and evaluation dataset from Kaggle DFDC. The results shows that our deepfake detection model acquired 42.78% log loss, 93.80% AUC and 86.82% F1 score on kaggle's DFDC dataset.

Key findings

The proposed model achieved a log loss of 0.4278, an AUC of 0.9380, and an F1 score of 0.8682 on the DFDC dataset. These results are competitive with other state-of-the-art models, demonstrating the effectiveness of the hybrid MTCNN-EfficientNet-B5 approach. The authors note that while performance is strong, it could be further improved by incorporating attention mechanisms or addressing limitations in face detection under challenging conditions.

Approach

The approach uses MTCNN to detect faces in video frames and then extracts features from the cropped faces using EfficientNet-B5. A confidence-weighted aggregation strategy combines frame-level predictions to generate a final video-level prediction.

Datasets

Kaggle DFDC dataset

Model(s)

MTCNN (face detector), EfficientNet-B5 (feature extractor and classifier)

Author countries

Bangladesh

← Previous