Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions

View on arXiv ← Back to list

Authors: Ricard Durall, Margret Keuper, Janis Keuper

Published: 2020-03-03 23:04:33+00:00

AI Summary

This paper demonstrates that common upsampling methods in generative convolutional neural networks (like GANs) fail to reproduce the spectral distributions of real data, leading to easily detectable artifacts in generated images. The authors propose a novel spectral regularization term added to the GAN training objective to mitigate this issue, improving both the spectral consistency and the overall quality of generated images.

Abstract

Generative convolutional deep neural networks, e.g. popular GAN architectures, are relying on convolution based up-sampling methods to produce non-scalar outputs like images or video sequences. In this paper, we show that common up-sampling methods, i.e. known as up-convolution or transposed convolution, are causing the inability of such models to reproduce spectral distributions of natural training data correctly. This effect is independent of the underlying architecture and we show that it can be used to easily detect generated data like deepfakes with up to 100% accuracy on public benchmarks. To overcome this drawback of current generative models, we propose to add a novel spectral regularization term to the training optimization objective. We show that this approach not only allows to train spectral consistent GANs that are avoiding high frequency errors. Also, we show that a correct approximation of the frequency spectrum has positive effects on the training stability and output quality of generative networks.

Key findings

Generated images exhibit significant spectral distortions detectable with high accuracy (up to 100% on public benchmarks). Adding the proposed spectral regularization term improves the spectral consistency and visual quality of GAN-generated images. The regularization also enhances GAN training stability, mitigating mode collapse.

Approach

The authors analyze the spectral distortions caused by upsampling methods (up-convolution and transposed convolution) in GANs. They propose a spectral regularization term added to the generator loss function during training to correct these distortions. This regularization term minimizes the difference between the spectral distribution of generated and real images.

Datasets

CelebA, FaceForensics++, Faces-HQ (a new dataset created by the authors)

Model(s)

GAN architectures (DCGAN, DRAGAN, LSGAN, WGAN-GP), Autoencoders (AEs), Support Vector Machines (SVMs), K-Means

Author countries

Germany

← Previous