Recurrent Convolutional Strategies for Face Manipulation Detection in Videos

View on arXiv ← Back to list

Authors: Ekraam Sabir, Jiaxin Cheng, Ayush Jaiswal, Wael AbdAlmageed, Iacopo Masi, Prem Natarajan

Published: 2019-05-02 06:06:25+00:00

AI Summary

This research paper proposes a novel approach for detecting face manipulations in videos by leveraging recurrent convolutional models and face preprocessing techniques. The approach improves upon the state-of-the-art in accuracy on the FaceForensics++ dataset by up to 4.55%, demonstrating the effectiveness of utilizing temporal information for video-based deepfake detection.

Abstract

The spread of misinformation through synthetically generated yet realistic images and videos has become a significant problem, calling for robust manipulation detection methods. Despite the predominant effort of detecting face manipulation in still images, less attention has been paid to the identification of tampered faces in videos by taking advantage of the temporal information present in the stream. Recurrent convolutional models are a class of deep learning models which have proven effective at exploiting the temporal information from image streams across domains. We thereby distill the best strategy for combining variations in these models along with domain specific face preprocessing techniques through extensive experimentation to obtain state-of-the-art performance on publicly available video-based facial manipulation benchmarks. Specifically, we attempt to detect Deepfake, Face2Face and FaceSwap tampered faces in video streams. Evaluation is performed on the recently introduced FaceForensics++ dataset, improving the previous state-of-the-art by up to 4.55% in accuracy.

Key findings

DenseNet with bidirectional GRU and landmark-based face alignment yielded the best performance. The use of temporal information through recurrent networks significantly improved detection accuracy. The proposed method outperforms previous state-of-the-art methods by up to 4.55% on the FaceForensics++ dataset.

Approach

The authors combine a recurrent convolutional network with face alignment techniques to detect deepfakes. They experiment with different CNN architectures (ResNet, DenseNet), alignment methods (landmark-based, STN), and recurrent strategies (uni-directional, bi-directional, multi-level) to optimize performance. The final model is trained end-to-end on the FaceForensics++ dataset.

Datasets

FaceForensics++

Model(s)

ResNet, DenseNet, Bidirectional GRU

Author countries

USA

← Previous