Deepfake Detection using Spatiotemporal Convolutional Networks

Authors: Oscar de Lima, Sean Franklin, Shreshtha Basu, Blake Karwoski, Annet George

Published: 2020-06-26 01:32:31+00:00

AI Summary

This paper benchmarks spatiotemporal convolutional networks for deepfake detection, addressing the limitation of frame-based methods that ignore temporal information. Their proposed methods outperform state-of-the-art frame-based techniques on the Celeb-DF dataset.

Abstract

Better generative models and larger datasets have led to more realistic fake videos that can fool the human eye but produce temporal and spatial artifacts that deep learning approaches can detect. Most current Deepfake detection methods only use individual video frames and therefore fail to learn from temporal information. We created a benchmark of the performance of spatiotemporal convolutional methods using the Celeb-DF dataset. Our methods outperformed state-of-the-art frame-based detection methods. Code for our paper is publicly available at https://github.com/oidelima/Deepfake-Detection.


Key findings
Spatiotemporal convolutional networks significantly improve deepfake detection accuracy compared to frame-based methods. R3D achieved the highest performance, surpassing even the strong I3D model. The results highlight the importance of considering temporal information for robust deepfake detection.
Approach
The authors utilize several spatiotemporal convolutional network architectures (R3D, I3D, MC3, ResNet (2+1)D, RCN) pre-trained on large video datasets. These models process video clips to leverage temporal information for improved deepfake detection compared to frame-by-frame analysis.
Datasets
Celeb-DF (v2)
Model(s)
RCN, R3D, ResNet Mixed 3D-2D (MC3), ResNet (2+1)D, I3D
Author countries
USA