DeepFeatureX Net: Deep Features eXtractors based Network for discriminating synthetic from real images

Authors: Orazio Pontorno, Luca Guarnera, Sebastiano Battiato

Published: 2024-04-24 07:25:36+00:00

AI Summary

This paper introduces DeepFeatureX Net, a novel approach to address the generalization challenge in discriminating synthetic from real images. The method employs three specialized Base Models, each trained on deliberately unbalanced datasets to extract discriminative features specific to Diffusion Model-generated, GAN-generated, or real images. These extracted features are then concatenated and processed by a custom network to determine the image's origin.

Abstract

Deepfakes, synthetic images generated by deep learning algorithms, represent one of the biggest challenges in the field of Digital Forensics. The scientific community is working to develop approaches that can discriminate the origin of digital images (real or AI-generated). However, these methodologies face the challenge of generalization, that is, the ability to discern the nature of an image even if it is generated by an architecture not seen during training. This usually leads to a drop in performance. In this context, we propose a novel approach based on three blocks called Base Models, each of which is responsible for extracting the discriminative features of a specific image class (Diffusion Model-generated, GAN-generated, or real) as it is trained by exploiting deliberately unbalanced datasets. The features extracted from each block are then concatenated and processed to discriminate the origin of the input image. Experimental results showed that this approach not only demonstrates good robust capabilities to JPEG compression but also outperforms state-of-the-art methods in several generalization tests. Code, models and dataset are available at https://github.com/opontorno/block-based_deepfake-detection.


Key findings
The DeepFeatureX Net demonstrates good robustness against JPEG compression, maintaining high performance even at low Quality Factors, despite being trained on raw images. It significantly outperforms state-of-the-art methods in generalization tests, achieving over 10% higher classification accuracy when discriminating between images generated by both known and unseen architectures, as well as mixed generative technologies.
Approach
The DeepFeatureX Net uses three Base Models, which are pre-trained CNN backbones, as specialized feature extractors. Each Base Model is trained using an unbalanced dataset to focus on a specific image class (DM-generated, GAN-generated, or real). Features extracted from these frozen Base Models are concatenated and then fed into a custom CNN for final classification.
Datasets
Real images: CelebA, FFHQ, AFHQ, Imagenet, COCO. GAN-generated images: GauGAN, BigGAN, ProGAN, StarGAN, AttGAN, GDWCT, CycleGAN, StyleGAN, StyleGAN2, StyleGAN3, GANformer, Denoising DiffusionGANs, DiffusionGANs, ProjectedGANs, Taming Transformers. DM-generated images: DALL-E MINI, DALL-E 2, Latent Diffusion, Stable Diffusion 2, COCOFake (from Stable Diffusion 3), VQ Diffusion, Denoising Diffusion Probabilistic Model (DDPM), COCOGlide (from Glide).
Model(s)
Base Model backbones include DenseNet (121, 161, 169, 201), EfficientNet (b0, b4), ResNet (18, 34, 50, 101, 152), ResNeXt 101, and ViT (b16, b32), all pre-trained on ImageNet. The final processing unit is a custom CNN consisting of 5 1D convolution layers, Global Average Pooling, and a three-node linear output classifier.
Author countries
Italy