Restricted Black-box Adversarial Attack Against DeepFake Face Swapping

Authors: Junhao Dong, Yuan Wang, Jianhuang Lai, Xiaohua Xie

Published: 2022-04-26 14:36:06+00:00

AI Summary

This paper introduces a novel restricted black-box adversarial attack against DeepFake face swapping, using a substitute model and a Transferable Cycle Adversary Generative Adversarial Network (TCA-GAN) to generate transferable adversarial examples without querying the target model. A post-regularization module further enhances transferability, leading to significant disruption of DeepFake outputs.

Abstract

DeepFake face swapping presents a significant threat to online security and social media, which can replace the source face in an arbitrary photo/video with the target face of an entirely different person. In order to prevent this fraud, some researchers have begun to study the adversarial methods against DeepFake or face manipulation. However, existing works focus on the white-box setting or the black-box setting driven by abundant queries, which severely limits the practical application of these methods. To tackle this problem, we introduce a practical adversarial attack that does not require any queries to the facial image forgery model. Our method is built on a substitute model persuing for face reconstruction and then transfers adversarial examples from the substitute model directly to inaccessible black-box DeepFake models. Specially, we propose the Transferable Cycle Adversary Generative Adversarial Network (TCA-GAN) to construct the adversarial perturbation for disrupting unknown DeepFake systems. We also present a novel post-regularization module for enhancing the transferability of generated adversarial examples. To comprehensively measure the effectiveness of our approaches, we construct a challenging benchmark of DeepFake adversarial attacks for future development. Extensive experiments impressively show that the proposed adversarial attack method makes the visual quality of DeepFake face images plummet so that they are easier to be detected by humans and algorithms. Moreover, we demonstrate that the proposed algorithm can be generalized to offer face image protection against various face translation methods.


Key findings
The proposed attack effectively degrades the visual quality of DeepFake face images, making them easier to detect by humans and algorithms. The method generalizes well to other face translation methods. The adversarial examples enhance the performance of several image-level DeepFake detection methods.
Approach
The authors propose a restricted black-box attack using a substitute model (autoencoder) for face reconstruction. They train a TCA-GAN to generate adversarial perturbations on the substitute model, transferring these to the target DeepFake model without access. A post-regularization module improves transferability.
Datasets
A custom DeepFake dataset of 6274 images (40 men, 38 women), and CelebA for generalization tests.
Model(s)
TCA-GAN (Transferable Cycle Adversary Generative Adversarial Network), Autoencoder (substitute model), various DeepFake models (not specified), StarGAN, AttGAN.
Author countries
China