Deepfake Videos in the Wild: Analysis and Detection

View on arXiv ← Back to list

Authors: Jiameng Pu, Neal Mangaokar, Lauren Kelly, Parantapa Bhattacharya, Kavya Sundaram, Mobin Javed, Bolun Wang, Bimal Viswanath

Published: 2021-03-07 04:40:15+00:00

AI Summary

This research paper introduces DF-W, the largest dataset of deepfake videos found in the wild (YouTube and Bilibili), containing 1,869 videos and 4.8M frames. The authors analyze these videos and evaluate existing deepfake detection methods, finding them inadequate for real-world deployment. They explore transfer learning to improve detection performance, showcasing its potential but also highlighting limitations.

Abstract

AI-manipulated videos, commonly known as deepfakes, are an emerging problem. Recently, researchers in academia and industry have contributed several (self-created) benchmark deepfake datasets, and deepfake detection algorithms. However, little effort has gone towards understanding deepfake videos in the wild, leading to a limited understanding of the real-world applicability of research contributions in this space. Even if detection schemes are shown to perform well on existing datasets, it is unclear how well the methods generalize to real-world deepfakes. To bridge this gap in knowledge, we make the following contributions: First, we collect and present the largest dataset of deepfake videos in the wild, containing 1,869 videos from YouTube and Bilibili, and extract over 4.8M frames of content. Second, we present a comprehensive analysis of the growth patterns, popularity, creators, manipulation strategies, and production methods of deepfake content in the real-world. Third, we systematically evaluate existing defenses using our new dataset, and observe that they are not ready for deployment in the real-world. Fourth, we explore the potential for transfer learning schemes and competition-winning techniques to improve defenses.

Key findings

Existing deepfake detection methods perform poorly on real-world deepfakes (DF-W dataset), achieving F1 scores below 77%. Transfer learning shows promise for improvement, but performance remains inadequate. Racial bias in detection was also observed.

Approach

The authors created a large dataset (DF-W) of deepfake videos from YouTube and Bilibili. They analyzed characteristics of these videos and evaluated existing deepfake detection methods on this dataset, finding low performance. They then explored transfer learning techniques to improve detection accuracy.

Datasets

DF-W (YouTube and Bilibili), FaceForensics++, UADFV, DeepFakeTIMIT, DFD, Celeb-DF, DFDC

Model(s)

CapsuleForensics, Xception, MesoNet, Multi-Task, VA, FWA, DSP-FWA, Seferbekov model (DFDC winner)

Author countries

USA, Pakistan, China

← Previous