Real is not True: Backdoor Attacks Against Deepfake Detection

Authors: Hong Sun, Ziqiang Li, Lei Liu, Bin Li

Published: 2024-03-11 10:57:14+00:00

AI Summary

This paper introduces Bad-Deepfake, a novel backdoor attack against deepfake detection models. By strategically manipulating a subset of training data, Bad-Deepfake achieves a 100% attack success rate against existing detectors, highlighting their vulnerability to this type of attack.

Abstract

The proliferation of malicious deepfake applications has ignited substantial public apprehension, casting a shadow of doubt upon the integrity of digital media. Despite the development of proficient deepfake detection mechanisms, they persistently demonstrate pronounced vulnerability to an array of attacks. It is noteworthy that the pre-existing repertoire of attacks predominantly comprises adversarial example attack, predominantly manifesting during the testing phase. In the present study, we introduce a pioneering paradigm denominated as Bad-Deepfake, which represents a novel foray into the realm of backdoor attacks levied against deepfake detectors. Our approach hinges upon the strategic manipulation of a delimited subset of the training data, enabling us to wield disproportionate influence over the operational characteristics of a trained model. This manipulation leverages inherent frailties inherent to deepfake detectors, affording us the capacity to engineer triggers and judiciously select the most efficacious samples for the construction of the poisoned set. Through the synergistic amalgamation of these sophisticated techniques, we achieve an remarkable performance-a 100% attack success rate (ASR) against extensively employed deepfake detectors.


Key findings
Bad-Deepfake achieves a 100% attack success rate against state-of-the-art deepfake detection models. The attack maintains high benign accuracy, meaning it doesn't significantly reduce the model's performance on legitimate inputs. The poisoned images generated appear visually natural.
Approach
Bad-Deepfake leverages inherent weaknesses in deepfake detectors to construct triggers for a backdoor attack. It uses a Filtering-and-Updating Strategy (FUS) to select the most influential samples for creating a poisoned dataset that is then used to train a vulnerable model.
Datasets
FaceForensics++ dataset, including DeepFakes, Face2Face, FaceShifter, FaceSwap, and NeuralTextures manipulations.
Model(s)
SE-ResNeXt
Author countries
China