Detecting Face2Face Facial Reenactment in Videos

Authors: Prabhat Kumar, Mayank Vatsa, Richa Singh

Published: 2020-01-21 11:03:50+00:00

Comment: 9 pages

AI Summary

This research proposes a learning-based algorithm for detecting reenactment-based alterations, specifically Face2Face DeepFakes, in videos. It introduces a multi-stream network that learns regional artifacts for robust performance across various compression levels. The paper also proposes a loss function for the balanced learning of the streams within the network, achieving state-of-the-art classification accuracy on the FaceForensics dataset.

Abstract

Visual content has become the primary source of information, as evident in the billions of images and videos, shared and uploaded on the Internet every single day. This has led to an increase in alterations in images and videos to make them more informative and eye-catching for the viewers worldwide. Some of these alterations are simple, like copy-move, and are easily detectable, while other sophisticated alterations like reenactment based DeepFakes are hard to detect. Reenactment alterations allow the source to change the target expressions and create photo-realistic images and videos. While technology can be potentially used for several applications, the malicious usage of automatic reenactment has a very large social implication. It is therefore important to develop detection techniques to distinguish real images and videos with the altered ones. This research proposes a learning-based algorithm for detecting reenactment based alterations. The proposed algorithm uses a multi-stream network that learns regional artifacts and provides a robust performance at various compression levels. We also propose a loss function for the balanced learning of the streams for the proposed network. The performance is evaluated on the publicly available FaceForensics dataset. The results show state-of-the-art classification accuracy of 99.96%, 99.10%, and 91.20% for no, easy, and hard compression factors, respectively.


Key findings
The proposed algorithm achieved state-of-the-art classification accuracy of 99.96%, 99.10%, and 91.20% for no, easy, and hard compression factors, respectively, on the FaceForensics dataset. The multi-stream architecture, aided by the custom loss function, demonstrated superior robustness to compression compared to existing methods. Analysis of class activation maps showed that regional classifiers capture crucial artifacts in areas like the eyes and face boundaries, which are often overlooked or suppressed by full-face models, especially under high compression.
Approach
The proposed method uses a multi-stream deep learning network comprising five parallel ResNet-18 models. Four streams are dedicated to learning local, regional facial artifacts (from a 2x2 grid of the face), while one stream processes the full face. A novel loss function is introduced to facilitate balanced training of these streams, enabling the network to capture both localized and global artifacts for robust detection at various compression levels.
Datasets
FaceForensics, FaceForensics++
Model(s)
ResNet-18
Author countries
India