Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis

Authors: Yang He, Ning Yu, Margret Keuper, Mario Fritz

Published: 2021-05-29 21:22:24+00:00

AI Summary

This paper proposes a novel deepfake detection method that re-synthesizes images using tasks like super-resolution, denoising, and colorization, extracting visual cues from the reconstruction errors for detection. This approach is shown to be more effective, generalize across different GANs, and robust against perturbations compared to methods relying on frequency artifacts.

Abstract

The rapid advances in deep generative models over the past years have led to highly {realistic media, known as deepfakes,} that are commonly indistinguishable from real to human eyes. These advances make assessing the authenticity of visual data increasingly difficult and pose a misinformation threat to the trustworthiness of visual content in general. Although recent work has shown strong detection accuracy of such deepfakes, the success largely relies on identifying frequency artifacts in the generated images, which will not yield a sustainable detection approach as generative models continue evolving and closing the gap to real images. In order to overcome this issue, we propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection. The re-synthesis procedure is flexible, allowing us to incorporate a series of visual tasks - we adopt super-resolution, denoising and colorization as the re-synthesis. We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios involving multiple generators over CelebA-HQ, FFHQ, and LSUN datasets. Source code is available at https://github.com/SSAW14/BeyondtheSpectrum.


Key findings
The proposed re-synthesis based approach outperforms existing methods in accuracy and robustness across various GANs and datasets. It shows superior cross-GAN generalization and resilience against perturbations designed to mask deepfakes. The use of hierarchical artifacts from the re-synthesizer significantly improves performance.
Approach
The authors propose a two-component model: a re-synthesizer trained on real images to perform image manipulation tasks (super-resolution, denoising, colorization) and a classifier trained on the residuals (reconstruction errors) from the re-synthesizer. The classifier learns to distinguish real from fake images based on these residuals, which are less susceptible to manipulation than frequency artifacts.
Datasets
CelebA-HQ, FFHQ, LSUN
Model(s)
ResNet-50, VGG (pretrained for perceptual loss), a custom 4x super-resolution model
Author countries
Germany, United States