FSBI: Deepfakes Detection with Frequency Enhanced Self-Blended Images

Authors: Ahmed Abul Hasanaath, Hamzah Luqman, Raed Katib, Saeed Anwar

Published: 2024-06-12 20:15:00+00:00

AI Summary

This paper proposes Frequency Enhanced Self-Blended Images (FSBI) for deepfake detection. FSBI uses Discrete Wavelet Transforms (DWT) to extract features from self-blended images, enhancing the detection of subtle forgery artifacts and improving the robustness of a convolutional neural network (CNN) classifier.

Abstract

Advances in deepfake research have led to the creation of almost perfect manipulations undetectable by human eyes and some deepfakes detection tools. Recently, several techniques have been proposed to differentiate deepfakes from realistic images and videos. This paper introduces a Frequency Enhanced Self-Blended Images (FSBI) approach for deepfakes detection. This proposed approach utilizes Discrete Wavelet Transforms (DWT) to extract discriminative features from the self-blended images (SBI) to be used for training a convolutional network architecture model. The SBIs blend the image with itself by introducing several forgery artifacts in a copy of the image before blending it. This prevents the classifier from overfitting specific artifacts by learning more generic representations. These blended images are then fed into the frequency features extractor to detect artifacts that can not be detected easily in the time domain. The proposed approach has been evaluated on FF++ and Celeb-DF datasets and the obtained results outperformed the state-of-the-art techniques with the cross-dataset evaluation protocol.


Key findings
FSBI outperforms state-of-the-art techniques in both within-dataset and cross-dataset evaluations on FF++ and Celeb-DF. The use of DWT significantly improves the detection of deepfakes, particularly those generated using methods that introduce subtle artifacts. The ablation studies confirm the importance of the proposed components and hyperparameter choices.
Approach
The approach blends an image with a transformed version of itself to create self-blended images (SBIs). Discrete Wavelet Transforms (DWT) are then applied to extract frequency-domain features from these SBIs, which are subsequently fed into a CNN for classification as real or fake.
Datasets
FF++ (specifically FF++(LQ)) and Celeb-DF datasets
Model(s)
EfficientNet-B5 (pre-trained on ImageNet and fine-tuned for deepfake detection)
Author countries
Saudi Arabia