Exploring Adversarial Fake Images on Face Manifold

Authors: Dongze Li, Wei Wang, Hongxing Fan, Jing Dong

Published: 2021-01-09 02:08:59+00:00

AI Summary

This paper proposes a novel method for generating adversarial fake face images that can bypass deepfake detection models. Instead of adding noise, the approach optimizes latent vectors within a Style-GAN's latent space to generate visually realistic fake images that are misclassified as real by state-of-the-art detectors.

Abstract

Images synthesized by powerful generative adversarial network (GAN) based methods have drawn moral and privacy concerns. Although image forensic models have reached great performance in detecting fake images from real ones, these models can be easily fooled with a simple adversarial attack. But, the noise adding adversarial samples are also arousing suspicion. In this paper, instead of adding adversarial noise, we optimally search adversarial points on face manifold to generate anti-forensic fake face images. We iteratively do a gradient-descent with each small step in the latent space of a generative model, e.g. Style-GAN, to find an adversarial latent vector, which is similar to norm-based adversarial attack but in latent space. Then, the generated fake images driven by the adversarial latent vectors with the help of GANs can defeat main-stream forensic models. For examples, they make the accuracy of deepfake detection models based on Xception or EfficientNet drop from over 90% to nearly 0%, meanwhile maintaining high visual quality. In addition, we find manipulating style vector $z$ or noise vectors $n$ at different levels have impacts on attack success rate. The generated adversarial images mainly have facial texture or face attributes changing.


Key findings
The proposed method successfully reduced the accuracy of deepfake detection models (Xception and EfficientNet) from over 90% to near 0% while preserving visual quality. Ensemble attacks further improved the success rate, demonstrating the vulnerability of current detection methods. The impact of manipulating noise vectors at different levels of the Style-GAN generator on the attack success rate was also explored.
Approach
The method iteratively performs gradient descent in the latent space of a Style-GAN to find adversarial latent vectors. These vectors generate fake images that maximize the loss of existing deepfake detection models, effectively fooling them while maintaining high visual quality.
Datasets
FFHQ dataset
Model(s)
Style-GAN, Xception, EfficientNet
Author countries
China