Towards Robust GAN-generated Image Detection: a Multi-view Completion Representation

View on arXiv ← Back to list

Authors: Chi Liu, Tianqing Zhu, Sheng Shen, Wanlei Zhou

Published: 2023-06-02 08:38:02+00:00

AI Summary

This paper proposes a robust GAN-generated image detection framework using multi-view image completion. The framework learns diverse distributions of genuine images to represent frequency-irrelevant features, improving generalization and robustness against unknown GANs and perturbations.

Abstract

GAN-generated image detection now becomes the first line of defense against the malicious uses of machine-synthesized image manipulations such as deepfakes. Although some existing detectors work well in detecting clean, known GAN samples, their success is largely attributable to overfitting unstable features such as frequency artifacts, which will cause failures when facing unknown GANs or perturbation attacks. To overcome the issue, we propose a robust detection framework based on a novel multi-view image completion representation. The framework first learns various view-to-image tasks to model the diverse distributions of genuine images. Frequency-irrelevant features can be represented from the distributional discrepancies characterized by the completion models, which are stable, generalized, and robust for detecting unknown fake patterns. Then, a multi-view classification is devised with elaborated intra- and inter-view learning strategies to enhance view-specific feature representation and cross-view feature aggregation, respectively. We evaluated the generalization ability of our framework across six popular GANs at different resolutions and its robustness against a broad range of perturbation attacks. The results confirm our method's improved effectiveness, generalization, and robustness over various baselines.

Key findings

The proposed method outperforms existing baselines in cross-GAN and cross-domain detection. It shows improved robustness against various perturbation attacks, including adversarial examples. The multi-view completion effectively reduces reliance on unstable frequency features, leading to better generalization.

Approach

The approach uses multiple view-to-image completion models trained only on real images to model genuine image distributions. A multi-view classification then leverages the distributional discrepancies between completed real and fake images for detection, focusing on frequency-irrelevant features.

Datasets

CelebA, CelebA-HQ, datasets of images generated by ProGAN, CramerGAN, SNGAN, MMDGAN, StyleGAN, and StyleGAN2.

Model(s)

U-Net (for restorers), Xception (for classifiers)

Author countries

Australia, Australia, Australia, China

← Previous