Detecting Face2Face Facial Reenactment in Videos

Authors: Prabhat Kumar, Mayank Vatsa, Richa Singh

Published: 2020-01-21 11:03:50+00:00

AI Summary

This paper proposes a multi-stream deep learning network for detecting Face2Face facial reenactment in videos. The network learns regional artifacts from local facial regions and combines them with full-face features for robust detection across various compression levels. This approach achieves state-of-the-art accuracy on the FaceForensics dataset.

Abstract

Visual content has become the primary source of information, as evident in the billions of images and videos, shared and uploaded on the Internet every single day. This has led to an increase in alterations in images and videos to make them more informative and eye-catching for the viewers worldwide. Some of these alterations are simple, like copy-move, and are easily detectable, while other sophisticated alterations like reenactment based DeepFakes are hard to detect. Reenactment alterations allow the source to change the target expressions and create photo-realistic images and videos. While technology can be potentially used for several applications, the malicious usage of automatic reenactment has a very large social implication. It is therefore important to develop detection techniques to distinguish real images and videos with the altered ones. This research proposes a learning-based algorithm for detecting reenactment based alterations. The proposed algorithm uses a multi-stream network that learns regional artifacts and provides a robust performance at various compression levels. We also propose a loss function for the balanced learning of the streams for the proposed network. The performance is evaluated on the publicly available FaceForensics dataset. The results show state-of-the-art classification accuracy of 99.96%, 99.10%, and 91.20% for no, easy, and hard compression factors, respectively.


Key findings
The proposed approach achieves state-of-the-art accuracy (99.96%, 99.10%, and 91.20% for no, easy, and hard compression, respectively) on the FaceForensics dataset. The multi-stream approach shows improved robustness to compression compared to other methods. The custom loss function effectively balances the training of different streams, enhancing overall performance.
Approach
The authors propose a multi-stream network with five parallel ResNet-18 models. Four streams process local facial regions (2x2 grid), while one processes the full face. A custom loss function balances the training of these streams, improving robustness to compression artifacts.
Datasets
FaceForensics dataset
Model(s)
Five parallel ResNet-18 models, combined with a custom loss function and score fusion.
Author countries
India, India