AVA: Inconspicuous Attribute Variation-based Adversarial Attack bypassing DeepFake Detection

View on arXiv ← Back to list

Authors: Xiangtao Meng, Li Wang, Shanqing Guo, Lei Ju, Qingchuan Zhao

Published: 2023-12-14 06:25:56+00:00

AI Summary

This paper introduces a novel attribute-variation-based adversarial attack (AVA) that manipulates inconspicuous image attributes (e.g., mouth open) to bypass state-of-the-art deepfake detection algorithms. AVA perturbs the latent space of deepfake images, achieving over 95% success rate on commercial detectors while remaining largely imperceptible to humans.

Abstract

While DeepFake applications are becoming popular in recent years, their abuses pose a serious privacy threat. Unfortunately, most related detection algorithms to mitigate the abuse issues are inherently vulnerable to adversarial attacks because they are built atop DNN-based classification models, and the literature has demonstrated that they could be bypassed by introducing pixel-level perturbations. Though corresponding mitigation has been proposed, we have identified a new attribute-variation-based adversarial attack (AVA) that perturbs the latent space via a combination of Gaussian prior and semantic discriminator to bypass such mitigation. It perturbs the semantics in the attribute space of DeepFake images, which are inconspicuous to human beings (e.g., mouth open) but can result in substantial differences in DeepFake detection. We evaluate our proposed AVA attack on nine state-of-the-art DeepFake detection algorithms and applications. The empirical results demonstrate that AVA attack defeats the state-of-the-art black box attacks against DeepFake detectors and achieves more than a 95% success rate on two commercial DeepFake detectors. Moreover, our human study indicates that AVA-generated DeepFake images are often imperceptible to humans, which presents huge security and privacy concerns.

Key findings

AVA successfully bypassed nine state-of-the-art deepfake detectors, including commercial black-box systems, with over 95% success rate in some cases. Human studies showed that the manipulated images were often imperceptible to humans, highlighting the significant security and privacy risks posed by this attack.

Approach

AVA uses a GAN to manipulate inconspicuous attributes in the latent space of deepfake images. It employs a Gaussian prior to optimize latent codes and a semantic discriminator to constrain attribute variations within a reasonable range, making the manipulated images more natural and harder to detect.

Datasets

StyleGAN and FaceForensics++

Model(s)

UNKNOWN (The paper evaluates the attack against various deepfake detection models, not a specific model used for the attack itself.)

Author countries

China, Hong Kong

← Previous