DeepFeatureX Net: Deep Features eXtractors based Network for discriminating synthetic from real images

Authors: Orazio Pontorno, Luca Guarnera, Sebastiano Battiato

Published: 2024-04-24 07:25:36+00:00

AI Summary

This paper proposes DeepFeatureX Net, a novel deepfake detection approach using three base models to extract discriminative features from different image classes (real, GAN-generated, and diffusion model-generated). The concatenated features are then processed to classify the image origin, showing robustness to JPEG compression and outperforming state-of-the-art methods in generalization tests.

Abstract

Deepfakes, synthetic images generated by deep learning algorithms, represent one of the biggest challenges in the field of Digital Forensics. The scientific community is working to develop approaches that can discriminate the origin of digital images (real or AI-generated). However, these methodologies face the challenge of generalization, that is, the ability to discern the nature of an image even if it is generated by an architecture not seen during training. This usually leads to a drop in performance. In this context, we propose a novel approach based on three blocks called Base Models, each of which is responsible for extracting the discriminative features of a specific image class (Diffusion Model-generated, GAN-generated, or real) as it is trained by exploiting deliberately unbalanced datasets. The features extracted from each block are then concatenated and processed to discriminate the origin of the input image. Experimental results showed that this approach not only demonstrates good robust capabilities to JPEG compression but also outperforms state-of-the-art methods in several generalization tests. Code, models and dataset are available at https://github.com/opontorno/block-based_deepfake-detection.


Key findings
DeepFeatureX Net demonstrates high accuracy and F1-score in both raw and JPEG-compressed images. It outperforms state-of-the-art methods in generalization tests, especially when dealing with images generated by unseen models or a mix of GAN and diffusion models. The use of DenseNet backbones yields particularly good results.
Approach
DeepFeatureX Net trains three separate base models, each specialized on a specific image class (real, GAN-generated, DM-generated) using unbalanced datasets. Features extracted from these models are concatenated and fed into a custom CNN for final classification. This approach improves generalization and robustness.
Datasets
A dataset of 72,334 images: 19,334 real images from CelebA, FFHQ, and other sources; 37,572 GAN-generated images (GauGAN, BigGAN, ProGAN, StarGAN, AttGAN, GDWCT, CycleGAN, StyleGAN, StyleGAN2, StyleGAN3); and 15,423 DM-generated images (DALL-E MINI 1, DALL-E 2, Latent Diffusion, Stable Diffusion).
Model(s)
DenseNet121, DenseNet161, DenseNet169, DenseNet201, EfficientNet b0, EfficientNet b4, ResNet 18, ResNet 34, ResNet 50, ResNet 101, ResNet 152, ResNeXt 101, ViT b16, ViT b32 (as backbones for base models); a custom 1D CNN for final classification.
Author countries
Italy