A Convolutional LSTM based Residual Network for Deepfake Video Detection

Authors: Shahroz Tariq, Sangyup Lee, Simon S. Woo

Published: 2020-09-16 05:57:06+00:00

AI Summary

This paper proposes CLRNet, a Convolutional LSTM based Residual Network for deepfake video detection that leverages temporal information from consecutive video frames. The approach uses transfer learning to improve generalization across different deepfake methods, outperforming existing state-of-the-art methods on the FaceForensics++ dataset.

Abstract

In recent years, deep learning-based video manipulation methods have become widely accessible to masses. With little to no effort, people can easily learn how to generate deepfake videos with only a few victims or target images. This creates a significant social problem for everyone whose photos are publicly available on the Internet, especially on social media websites. Several deep learning-based detection methods have been developed to identify these deepfakes. However, these methods lack generalizability, because they perform well only for a specific type of deepfake method. Therefore, those methods are not transferable to detect other deepfake methods. Also, they do not take advantage of the temporal information of the video. In this paper, we addressed these limitations. We developed a Convolutional LSTM based Residual Network (CLRNet), which takes a sequence of consecutive images as an input from a video to learn the temporal information that helps in detecting unnatural looking artifacts that are present between frames of deepfake videos. We also propose a transfer learning-based approach to generalize different deepfake methods. Through rigorous experimentations using the FaceForensics++ dataset, we showed that our method outperforms five of the previously proposed state-of-the-art deepfake detection methods by better generalizing at detecting different deepfake methods using the same model.


Key findings
CLRNet outperforms five state-of-the-art deepfake detection methods, demonstrating superior generalization across different deepfake methods. Transfer learning significantly improves the model's ability to detect deepfakes generated by methods not present in the initial training data. The model shows strong learning capability and does not overfit to training data.
Approach
CLRNet analyzes sequences of consecutive frames to detect inconsistencies indicative of deepfakes. It employs Convolutional LSTMs to capture temporal information and a residual network architecture to mitigate the vanishing gradient problem. Transfer learning is used to enhance the model's generalizability to various deepfake generation techniques.
Datasets
FaceForensics++ dataset (DeepFake, FaceSwap, Face2Face, NeuralTextures, and DeepFakeDetection)
Model(s)
Convolutional LSTM based Residual Network (CLRNet), Xception, ShallowNet, Forensics Transfer (FT)
Author countries
South Korea