On Attribution of Deepfakes

Authors: Baiwu Zhang, Jin Peng Zhou, Ilia Shumailov, Nicolas Papernot

Published: 2020-08-20 20:25:18+00:00

AI Summary

This paper proposes a novel approach for attributing deepfakes to their generative models by optimizing over the source of entropy of each model to probabilistically reconstruct the seed used to generate the image. The method achieves high attribution accuracy (97.62%) and demonstrates robustness to certain perturbations and adversarial examples.

Abstract

Progress in generative modelling, especially generative adversarial networks, have made it possible to efficiently synthesize and alter media at scale. Malicious individuals now rely on these machine-generated media, or deepfakes, to manipulate social discourse. In order to ensure media authenticity, existing research is focused on deepfake detection. Yet, the adversarial nature of frameworks used for generative modeling suggests that progress towards detecting deepfakes will enable more realistic deepfake generation. Therefore, it comes at no surprise that developers of generative models are under the scrutiny of stakeholders dealing with misinformation campaigns. At the same time, generative models have a lot of positive applications. As such, there is a clear need to develop tools that ensure the transparent use of generative modeling, while minimizing the harm caused by malicious applications. Our technique optimizes over the source of entropy of each generative model to probabilistically attribute a deepfake to one of the models. We evaluate our method on the seminal example of face synthesis, demonstrating that our approach achieves 97.62% attribution accuracy, and is less sensitive to perturbations and adversarial examples. We discuss the ethical implications of our work, identify where our technique can be used, and highlight that a more meaningful legislative framework is required for a more transparent and ethical use of generative modeling. Finally, we argue that model developers should be capable of claiming plausible deniability and propose a second framework to do so -- this allows a model developer to produce evidence that they did not produce media that they are being accused of having produced.


Key findings
The proposed method achieves 97.62% attribution accuracy in a benign setting. It shows robustness to some non-adversarial manipulations but is susceptible to larger adversarial perturbations. A user study confirmed high agreement between human judgment and the automated attribution.
Approach
The approach reconstructs the seed used by a generative model to create a deepfake image by optimizing a distance function between the generated image and the target deepfake. Attribution is assigned to the model that produces the closest reconstruction.
Datasets
CelebA-HQ, FFHQ
Model(s)
ProgressiveGAN, StyleGAN, StyleGAN2
Author countries
Canada