Protecting Against Image Translation Deepfakes by Leaking Universal Perturbations from Black-Box Neural Networks

Authors: Nataniel Ruiz, Sarah Adel Bargal, Stan Sclaroff

Published: 2020-06-11 15:02:27+00:00

AI Summary

This paper presents Leaking Universal Perturbations (LUP), a novel algorithm for efficiently disrupting black-box image translation deepfake generation systems. LUP significantly reduces the number of queries needed to attack an image by leveraging information gathered from initial attacks on a small dataset, improving attack efficiency by approximately 30% compared to state-of-the-art methods.

Abstract

In this work, we develop efficient disruptions of black-box image translation deepfake generation systems. We are the first to demonstrate black-box deepfake generation disruption by presenting image translation formulations of attacks initially proposed for classification models. Nevertheless, a naive adaptation of classification black-box attacks results in a prohibitive number of queries for image translation systems in the real-world. We present a frustratingly simple yet highly effective algorithm Leaking Universal Perturbations (LUP), that significantly reduces the number of queries needed to attack an image. LUP consists of two phases: (1) a short leaking phase where we attack the network using traditional black-box attacks and gather information on successful attacks on a small dataset and (2) and an exploitation phase where we leverage said information to subsequently attack the network with improved efficiency. Our attack reduces the total number of queries necessary to attack GANimation and StarGAN by 30%.


Key findings
LUP reduced the average number of queries needed for successful attacks by approximately 30% compared to state-of-the-art black-box attack methods on GANimation and StarGAN. The method achieved high success rates while maintaining imperceptible perturbations. This significantly improves the efficiency of disrupting deepfake generation.
Approach
LUP uses a two-phase approach: a leaking phase where traditional black-box attacks are used on a small dataset to gather information, and an exploitation phase where this information (principal components from successful attacks) is used to improve the efficiency of subsequent attacks on a larger dataset using a modified SimBA attack.
Datasets
CelebA dataset
Model(s)
GANimation and StarGAN
Author countries
USA