Improving the Efficiency and Robustness of Deepfakes Detection through Precise Geometric Features

Authors: Zekun Sun, Yujie Han, Zeyu Hua, Na Ruan, Weijia Jia

Published: 2021-04-09 16:57:55+00:00

Comment: IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 (CVPR 2021)

AI Summary

This paper introduces LRNet, an efficient and robust framework for detecting Deepfakes videos by leveraging temporal modeling on precise geometric features. It proposes a novel calibration module to enhance the precision of facial landmarks and employs a two-stream Recurrent Neural Network (RNN) to effectively exploit temporal features. The method aims to overcome the limitations of previous appearance-based techniques, such as high model complexity and sensitivity to noise, by focusing on inherent temporal artifacts in manipulated faces.

Abstract

Deepfakes is a branch of malicious techniques that transplant a target face to the original one in videos, resulting in serious problems such as infringement of copyright, confusion of information, or even public panic. Previous efforts for Deepfakes videos detection mainly focused on appearance features, which have a risk of being bypassed by sophisticated manipulation, also resulting in high model complexity and sensitiveness to noise. Besides, how to mine the temporal features of manipulated videos and exploit them is still an open question. We propose an efficient and robust framework named LRNet for detecting Deepfakes videos through temporal modeling on precise geometric features. A novel calibration module is devised to enhance the precision of geometric features, making it more discriminative, and a two-stream Recurrent Neural Network (RNN) is constructed for sufficient exploitation of temporal features. Compared to previous methods, our proposed method is lighter-weighted and easier to train. Moreover, our method has shown robustness in detecting highly compressed or noise corrupted videos. Our model achieved 0.999 AUC on FaceForensics++ dataset. Meanwhile, it has a graceful decline in performance (-0.042 AUC) when faced with highly compressed videos.


Key findings
LRNet achieved an AUC of 0.999 on the FaceForensics++ dataset and demonstrated strong robustness against video compression, with only a -0.042 AUC decline on highly compressed videos, significantly outperforming other methods. It also showed superior robustness to video noise, experiencing only a 0.91% accuracy decline. The framework is notably lightweight, requiring significantly fewer parameters and less training time compared to existing state-of-the-art methods.
Approach
The LRNet framework detects Deepfakes by modeling temporal characteristics of precise geometric facial features. It utilizes a novel calibration module that combines Lucas-Kanade optical flow and a customized Kalman filter to enhance landmark precision. A two-stream Recurrent Neural Network (RNN), specifically using GRU, then processes sequences of these calibrated landmarks and their temporal differences to classify video authenticity.
Datasets
UADFV, FaceForensics++ (FF++), Celeb-DF, DeeperForensics-1.0 (DF1.0)
Model(s)
LRNet (Landmark Recurrent Network), which consists of a two-stream Recurrent Neural Network (RNN) using Gated Recurrent Units (GRU).
Author countries
China