Frequency Masking for Universal Deepfake Detection

Authors: Chandler Timm Doloriel, Ngai-Man Cheung

Published: 2024-01-12 11:02:12+00:00

AI Summary

This paper introduces a novel deepfake detection method using frequency masking in a supervised learning setting. The approach improves generalization capabilities by focusing on frequency domain features, unlike most existing methods which primarily target spatial domains, achieving substantial performance gains over state-of-the-art techniques.

Abstract

We study universal deepfake detection. Our goal is to detect synthetic images from a range of generative AI approaches, particularly from emerging ones which are unseen during training of the deepfake detector. Universal deepfake detection requires outstanding generalization capability. Motivated by recently proposed masked image modeling which has demonstrated excellent generalization in self-supervised pre-training, we make the first attempt to explore masked image modeling for universal deepfake detection. We study spatial and frequency domain masking in training deepfake detectors. Based on empirical analysis, we propose a novel deepfake detector via frequency masking. Our focus on frequency domain is different from the majority, which primarily target spatial domain detection. Our comparative analyses reveal substantial performance gains over existing methods. Code and models are publicly available.


Key findings
Frequency masking significantly outperforms spatial masking in deepfake detection. The proposed method consistently improves the mean average precision (mAP) of existing state-of-the-art deepfake detectors across various generative models. A 15% masking ratio yields optimal results.
Approach
The authors propose training a deepfake detector using frequency masking. This involves transforming images to the frequency domain using FFT, masking specific frequencies, and then using the inverse FFT to create masked images for training. This process forces the model to learn more robust and generalizable features.
Datasets
The datasets used include ProGAN (for training and validation), and various GAN models (ProGAN, CycleGAN, BigGAN, StyleGAN, GauGAN, StarGAN), DeepFake, low-level vision models (SITD and SAN), perceptual loss models (CRN and IMLE), and diffusion models (Guided Diffusion, Latent Diffusion, Glide, DALL-E-mini) for testing.
Model(s)
The paper doesn't explicitly mention the specific architecture of the deepfake detector but uses existing state-of-the-art models (Wang et al. [7], Gragnaniello et al. [2], Ojha et al. [1]) and incorporates the proposed frequency masking technique into them.
Author countries
Singapore