Evading DeepFake Detectors via Adversarial Statistical Consistency

View on arXiv ← Back to list

Authors: Yang Hou, Qing Guo, Yihao Huang, Xiaofei Xie, Lei Ma, Jianjun Zhao

Published: 2023-04-23 14:40:42+00:00

AI Summary

This paper proposes StatAttack and its multi-layer variant, MStatAttack, to evade deepfake detectors. These methods minimize statistical differences between fake and real images by adversarially adding natural degradations (exposure, blur, noise) guided by a distribution-aware loss, improving transferability across various detectors.

Abstract

In recent years, as various realistic face forgery techniques known as DeepFake improves by leaps and bounds,more and more DeepFake detection techniques have been proposed. These methods typically rely on detecting statistical differences between natural (i.e., real) and DeepFakegenerated images in both spatial and frequency domains. In this work, we propose to explicitly minimize the statistical differences to evade state-of-the-art DeepFake detectors. To this end, we propose a statistical consistency attack (StatAttack) against DeepFake detectors, which contains two main parts. First, we select several statistical-sensitive natural degradations (i.e., exposure, blur, and noise) and add them to the fake images in an adversarial way. Second, we find that the statistical differences between natural and DeepFake images are positively associated with the distribution shifting between the two kinds of images, and we propose to use a distribution-aware loss to guide the optimization of different degradations. As a result, the feature distributions of generated adversarial examples is close to the natural images.Furthermore, we extend the StatAttack to a more powerful version, MStatAttack, where we extend the single-layer degradation to multi-layer degradations sequentially and use the loss to tune the combination weights jointly. Comprehensive experimental results on four spatial-based detectors and two frequency-based detectors with four datasets demonstrate the effectiveness of our proposed attack method in both white-box and black-box settings.

Key findings

StatAttack and MStatAttack effectively evade both spatial and frequency-based deepfake detectors in white-box and black-box settings. MStatAttack shows improved attack success rates and transferability compared to StatAttack and baseline attacks. The methods maintain reasonable image quality.

Approach

StatAttack adversarially adds natural degradations (exposure, blur, noise) to fake images to minimize statistical differences with real images. MStatAttack extends this by applying multi-layer degradations and jointly optimizing their combination weights using a distribution-aware loss (MMD).

Datasets

StyleGANv2, ProGAN, StarGAN, FaceForensics++, CelebA, CelebA-HQ, FFHQ

Model(s)

ResNet50, EfficientNet-b4, DenseNet, MobileNet, DCTA, DFTD

Author countries

Japan, Singapore, Canada

← Previous