Aggregating Layers for Deepfake Detection

Authors: Amir Jevnisek, Shai Avidan

Published: 2022-10-11 14:29:47+00:00

AI Summary

This paper proposes a novel deepfake detection algorithm that aggregates features from all layers of a convolutional neural network backbone, improving robustness to unseen deepfake algorithms. Unlike traditional methods, this approach achieves state-of-the-art results by leveraging multi-level feature representations.

Abstract

The increasing popularity of facial manipulation (Deepfakes) and synthetic face creation raises the need to develop robust forgery detection solutions. Crucially, most work in this domain assume that the Deepfakes in the test set come from the same Deepfake algorithms that were used for training the network. This is not how things work in practice. Instead, we consider the case where the network is trained on one Deepfake algorithm, and tested on Deepfakes generated by another algorithm. Typically, supervised techniques follow a pipeline of visual feature extraction from a deep backbone, followed by a binary classification head. Instead, our algorithm aggregates features extracted across all layers of one backbone network to detect a fake. We evaluate our approach on two domains of interest - Deepfake detection and Synthetic image detection, and find that we achieve SOTA results.


Key findings
The proposed method achieves state-of-the-art results in cross-dataset generalization for both deepfake and synthetic image detection. The layer aggregation technique demonstrates superior robustness to unseen deepfake algorithms, and analysis reveals which features contribute most to the classification, enabling network trimming and fake region analysis.
Approach
The approach aggregates features extracted from all layers of a backbone network (EfficientNet-V2-Small) using skip connections to a final linear regression classification head. This allows the model to utilize features from various receptive fields and improve generalization across different deepfake algorithms and synthetic image generators.
Datasets
FaceForensics++ (with Deepfakes, Face2Face, FaceSwap, and NeuralTextures manipulations), CelebA-HQ, FFHQ datasets, synthetic images generated using PGAN, StyleGAN, StyleGAN2, Glow, and GMM.
Model(s)
EfficientNet-V2-Small as the backbone network with a custom layer aggregation module and a linear regression classification head.
Author countries
Israel