Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection

Authors: Chuangchuang Tan, Huan Liu, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, Yunchao Wei

Published: 2023-12-16 14:27:06+00:00

AI Summary

This paper introduces Neighboring Pixel Relationships (NPR), a novel artifact representation for deepfake detection, focusing on local pixel interdependence caused by upsampling operations in generative networks. NPR achieves state-of-the-art performance by generalizing across 28 distinct generative models, showing a significant improvement over existing methods.

Abstract

Recently, the proliferation of highly realistic synthetic images, facilitated through a variety of GANs and Diffusions, has significantly heightened the susceptibility to misuse. While the primary focus of deepfake detection has traditionally centered on the design of detection algorithms, an investigative inquiry into the generator architectures has remained conspicuously absent in recent years. This paper contributes to this lacuna by rethinking the architectures of CNN-based generators, thereby establishing a generalized representation of synthetic artifacts. Our findings illuminate that the up-sampling operator can, beyond frequency-based artifacts, produce generalized forgery artifacts. In particular, the local interdependence among image pixels caused by upsampling operators is significantly demonstrated in synthetic images generated by GAN or diffusion. Building upon this observation, we introduce the concept of Neighboring Pixel Relationships(NPR) as a means to capture and characterize the generalized structural artifacts stemming from up-sampling operations. A comprehensive analysis is conducted on an open-world dataset, comprising samples generated by tft{28 distinct generative models}. This analysis culminates in the establishment of a novel state-of-the-art performance, showcasing a remarkable tft{11.6%} improvement over existing methods. The code is available at https://github.com/chuangchuangtan/NPR-DeepfakeDetection.


Key findings
NPR significantly outperforms existing state-of-the-art deepfake detection methods across 28 different generative models, achieving a remarkable 11.6% improvement. The approach demonstrates strong generalization capabilities across both GAN and diffusion models, even when trained on a single GAN source (ProGAN). The local, spatial analysis of upsampling artifacts proves superior to global frequency-based approaches.
Approach
The authors analyze the local interdependence of pixels in images generated by GANs and diffusion models, caused by upsampling operations. They propose NPR, which captures these relationships as an artifact representation, used to train a deepfake detection model. This approach leverages local spatial information rather than global frequency analysis.
Datasets
ForenSynths (training), ForenSynths (testing), Self-Synthesis GANs, DIRE (diffusions), Ojha (diffusions), Self-Synthesis Diffusions (including DALLE and Midjourney), LSUN, ImageNet, CelebA, CelebA-HQ, COCO, FaceForensics++, LAION
Model(s)
A lightweight CNN network with convolutional layers and Resnet blocks. Baselines included CNNDetection, Frank, Durall, Patchfor, F3Net, SelfBland, GANDetection, BiHPF, FrePGAN, LGrad, and Ojha.
Author countries
China, Singapore