LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via Latent Ensemble Attack

View on arXiv ← Back to list

Authors: Joonkyo Shim, Hyunsoo Yoon

Published: 2023-07-04 07:00:37+00:00

AI Summary

The paper introduces Latent Ensemble Attack (LEAT), a novel method for robustly disrupting deepfake generation by attacking the latent encoding process rather than the output image. This target attribute-agnostic approach improves robustness and transferability across different deepfake models (GANs and Diffusion Models).

Abstract

Deepfakes, malicious visual contents created by generative models, pose an increasingly harmful threat to society. To proactively mitigate deepfake damages, recent studies have employed adversarial perturbation to disrupt deepfake model outputs. However, previous approaches primarily focus on generating distorted outputs based on only predetermined target attributes, leading to a lack of robustness in real-world scenarios where target attributes are unknown. Additionally, the transferability of perturbations between two prominent generative models, Generative Adversarial Networks (GANs) and Diffusion Models, remains unexplored. In this paper, we emphasize the importance of target attribute-transferability and model-transferability for achieving robust deepfake disruption. To address this challenge, we propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process. By disrupting the latent encoding process, it generates distorted output images in subsequent generation processes, regardless of the given target attributes. This target attribute-agnostic attack ensures robust disruption even when the target attributes are unknown. Additionally, we introduce a Normalized Gradient Ensemble strategy that effectively aggregates gradients for iterative gradient attacks, enabling simultaneous attacks on various types of deepfake models, involving both GAN-based and Diffusion-based models. Moreover, we demonstrate the insufficiency of evaluating disruption quality solely based on pixel-level differences. As a result, we propose an alternative protocol for comprehensively evaluating the success of defense. Extensive experiments confirm the efficacy of our method in disrupting deepfakes in real-world scenarios, reporting a higher defense success rate compared to previous methods.

Key findings

LEAT demonstrates higher defense success rates compared to previous methods, particularly in gray-box scenarios with unknown target attributes. LEAT is significantly faster than prior approaches, and its Normalized Gradient Ensemble improves model-transferability across various deepfake models, including both GAN and Diffusion based ones.

Approach

LEAT attacks the latent encoding stage of deepfake generation models, generating distorted outputs regardless of target attributes. A Normalized Gradient Ensemble strategy aggregates gradients from multiple models for improved attack effectiveness across diverse deepfake types.

Datasets

CelebA-HQ, VoxCeleb

Model(s)

StyleCLIP, Diffusion Autoencoders, SimSwap, ICface, StarGAN (for black-box testing)

Author countries

Republic of Korea

← Previous