Adversarial Deepfakes: Evaluating Vulnerability of Deepfake Detectors to Adversarial Examples

View on arXiv ← Back to list

Authors: Shehzeen Hussain, Paarth Neekhara, Malhar Jere, Farinaz Koushanfar, Julian McAuley

Published: 2020-02-09 07:10:58+00:00

AI Summary

This research paper demonstrates that state-of-the-art deepfake detection methods, reliant on deep neural networks (DNNs), are vulnerable to adversarial attacks. By adversarially modifying fake videos, the authors show that these detectors can be bypassed, even with robust image and video compression.

Abstract

Recent advances in video manipulation techniques have made the generation of fake videos more accessible than ever before. Manipulated videos can fuel disinformation and reduce trust in media. Therefore detection of fake videos has garnered immense interest in academia and industry. Recently developed Deepfake detection methods rely on deep neural networks (DNNs) to distinguish AI-generated fake videos from real videos. In this work, we demonstrate that it is possible to bypass such detectors by adversarially modifying fake videos synthesized using existing Deepfake generation methods. We further demonstrate that our adversarial perturbations are robust to image and video compression codecs, making them a real-world threat. We present pipelines in both white-box and black-box attack scenarios that can fool DNN based Deepfake detectors into classifying fake videos as real.

Key findings

The study reveals high success rates in fooling deepfake detectors using both white-box and black-box attacks, even under various compression methods. Robust adversarial examples, resilient to common image and video processing techniques, pose a significant threat to the reliability of current deepfake detection systems.

Approach

The authors propose white-box and black-box attack pipelines to generate adversarial examples for fake videos. These pipelines use iterative gradient sign methods, with the robust versions incorporating expectation over input transformations (like Gaussian blur, noise, and compression) to ensure the adversarial modifications remain effective even after compression.

Datasets

FaceForensics++ dataset, DeepFake Detection Challenge (DFDC) Dataset

Model(s)

XceptionNet, MesoNet, 3D EfficientNet

Author countries

USA

← Previous