Boosting Active Defense Persistence: A Two-Stage Defense Framework Combining Interruption and Poisoning Against Deepfake

Authors: Hongrui Zheng, Yuezun Li, Liejun Wang, Yunfeng Diao, Zhiqing Guo

Published: 2025-08-11 09:26:48+00:00

AI Summary

This paper addresses the short-lived effectiveness of active deepfake defenses by proposing a Two-Stage Defense Framework (TSDF). TSDF combines interruption and poisoning techniques using dual-function adversarial perturbations to distort forged content and prevent model adaptation through retraining, ensuring long-term defense effectiveness.

Abstract

Active defense strategies have been developed to counter the threat of deepfake technology. However, a primary challenge is their lack of persistence, as their effectiveness is often short-lived. Attackers can bypass these defenses by simply collecting protected samples and retraining their models. This means that static defenses inevitably fail when attackers retrain their models, which severely limits practical use. We argue that an effective defense not only distorts forged content but also blocks the model's ability to adapt, which occurs when attackers retrain their models on protected images. To achieve this, we propose an innovative Two-Stage Defense Framework (TSDF). Benefiting from the intensity separation mechanism designed in this paper, the framework uses dual-function adversarial perturbations to perform two roles. First, it can directly distort the forged results. Second, it acts as a poisoning vehicle that disrupts the data preparation process essential for an attacker's retraining pipeline. By poisoning the data source, TSDF aims to prevent the attacker's model from adapting to the defensive perturbations, thus ensuring the defense remains effective long-term. Comprehensive experiments show that the performance of traditional interruption methods degrades sharply when it is subjected to adversarial retraining. However, our framework shows a strong dual defense capability, which can improve the persistence of active defense. Our code will be available at https://github.com/vpsg-research/TSDF.


Key findings
Traditional interruption methods show drastically reduced effectiveness after adversarial retraining. TSDF demonstrates strong dual defense capability, significantly improving the persistence of active defense against adaptive attackers. The poisoning component of TSDF successfully disrupts face detection, hindering the attacker's retraining process.
Approach
TSDF uses dual-function adversarial perturbations. These perturbations directly distort forged images during inference (interruption) and disrupt face detection during the attacker's retraining process (poisoning). An intensity separation mechanism ensures the two functions operate synergistically.
Datasets
CelebA, LFW, FaceForensics++ Original video (FF++O)
Model(s)
StarGAN, AGGAN, AttGAN, HiSD
Author countries
China