The Effectiveness of Temporal Dependency in Deepfake Video Detection

View on arXiv ← Back to list

Authors: Will Rowan, Nick Pears

Published: 2022-05-13 14:39:25+00:00

AI Summary

This paper investigates whether incorporating temporal information improves deepfake video detection accuracy. A framework classifying deepfake detection approaches by feature extraction and temporal dependency is proposed and used to compare models, revealing that temporal dependency significantly improves performance for models using automatic feature selection.

Abstract

Deepfakes are a form of synthetic image generation used to generate fake videos of individuals for malicious purposes. The resulting videos may be used to spread misinformation, reduce trust in media, or as a form of blackmail. These threats necessitate automated methods of deepfake video detection. This paper investigates whether temporal information can improve the deepfake detection performance of deep learning models. To investigate this, we propose a framework that classifies new and existing approaches by their defining characteristics. These are the types of feature extraction: automatic or manual, and the temporal relationship between frames: dependent or independent. We apply this framework to investigate the effect of temporal dependency on a model's deepfake detection performance. We find that temporal dependency produces a statistically significant (p < 0.05) increase in performance in classifying real images for the model using automatic feature selection, demonstrating that spatio-temporal information can increase the performance of deepfake video detection models.

Key findings

Temporally dependent models significantly outperform temporally independent models (p<0.05) in classifying real images when using automatic feature selection. However, this improvement isn't statistically significant when using manual feature extraction based on face warping artifacts. The results suggest that incorporating temporal information can substantially improve deepfake video detection accuracy, particularly for models with automatic feature extraction.

Approach

The authors propose a framework to classify deepfake detection methods based on feature extraction (automatic or manual) and temporal dependency. They then implement and compare four models (two temporally independent and two dependent) using a subset of the FaceForensics++ dataset, analyzing the impact of temporal information on detection performance.

Datasets

FaceForensics++ dataset (a subset called TemporalFF++)

Model(s)

Meso-4 (MesoNet), ResNet50, CNN-LSTM architectures (combining the above CNNs with LSTMs for temporal dependency).

Author countries

United Kingdom

← Previous