D4: Detection of Adversarial Diffusion Deepfakes Using Disjoint Ensembles

Authors: Ashish Hooda, Neal Mangaokar, Ryan Feng, Kassem Fawaz, Somesh Jha, Atul Prakash

Published: 2022-02-11 15:21:11+00:00

AI Summary

This paper introduces D4, a deepfake detector that enhances black-box adversarial robustness by using an ensemble of models trained on disjoint subsets of the frequency spectrum. This approach reduces the dimensionality of the adversarial subspace, making it harder for attackers to generate adversarial deepfakes that evade detection.

Abstract

Detecting diffusion-generated deepfake images remains an open problem. Current detection methods fail against an adversary who adds imperceptible adversarial perturbations to the deepfake to evade detection. In this work, we propose Disjoint Diffusion Deepfake Detection (D4), a deepfake detector designed to improve black-box adversarial robustness beyond de facto solutions such as adversarial training. D4 uses an ensemble of models over disjoint subsets of the frequency spectrum to significantly improve adversarial robustness. Our key insight is to leverage a redundancy in the frequency domain and apply a saliency partitioning technique to disjointly distribute frequency components across multiple models. We formally prove that these disjoint ensembles lead to a reduction in the dimensionality of the input subspace where adversarial deepfakes lie, thereby making adversarial deepfakes harder to find for black-box attacks. We then empirically validate the D4 method against several black-box attacks and find that D4 significantly outperforms existing state-of-the-art defenses applied to diffusion-generated deepfake detection. We also demonstrate that D4 provides robustness against adversarial deepfakes from unseen data distributions as well as unseen generative techniques.


Key findings
D4 significantly outperforms state-of-the-art defenses against black-box attacks on diffusion-generated deepfakes, reducing attack success rates to 28% compared to over 90% for baselines. This robustness extends to unseen data distributions and generative techniques.
Approach
D4 leverages redundancy in the frequency domain of deepfake images. It partitions frequency components into disjoint subsets, each fed to a separate adversarially trained model. A voting mechanism combines the models' predictions to improve robustness against black-box attacks.
Datasets
UNKNOWN
Model(s)
An ensemble of CNN models, each trained on a disjoint subset of frequency components extracted via Discrete Cosine Transform (DCT).
Author countries
USA