GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

Authors: Chih-Chung Hsu, Shao-Ning Chen, Mei-Hsuan Wu, Yi-Fang Wang, Chia-Ming Lee, Yi-Shiuan Chou

Published: 2024-06-28 14:17:16+00:00

AI Summary

This paper proposes GRACE, a novel DeepFake video detection method robust to noisy face sequences. GRACE uses a graph convolutional network with graph Laplacian smoothing to entangle spatial and temporal features, effectively mitigating the impact of mis-detected faces. Comprehensive experiments demonstrate state-of-the-art performance, especially under challenging scenarios.

Abstract

As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques. This paper introduces a novel method for robust DeepFake video detection, harnessing the power of the proposed Graph-Regularized Attentive Convolutional Entanglement (GRACE) based on the graph convolutional network with graph Laplacian to address the aforementioned challenges. First, conventional Convolution Neural Networks are deployed to perform spatiotemporal features for the entire video. Then, the spatial and temporal features are mutually entangled by constructing a graph with sparse constraint, enforcing essential features of valid face images in the noisy face sequences remaining, thus augmenting stability and performance for DeepFake video detection. Furthermore, the Graph Laplacian prior is proposed in the graph convolutional network to remove the noise pattern in the feature space to further improve the performance. Comprehensive experiments are conducted to illustrate that our proposed method delivers state-of-the-art performance in DeepFake video detection under noisy face sequences. The source code is available at https://github.com/ming053l/GRACE.


Key findings
GRACE achieves state-of-the-art performance on multiple DeepFake datasets, surpassing existing methods, particularly in scenarios with noisy face sequences caused by adversarial attacks or unreliable face detection. The proposed Graph Laplacian smoothing and sparsity constraint significantly improve robustness to noise. The ablation study shows the effectiveness of each component in GRACE.
Approach
GRACE leverages a CNN backbone to extract spatiotemporal features. These features are entangled via an affinity matrix, forming a graph representation processed by a GCN with Graph Laplacian smoothing prior regularization and sparsity constraint to filter noise and focus on valid facial features. A classifier then determines video authenticity.
Datasets
FaceForensics++ (FF++), Celeb-DFv2, DeepFake Detection Challenge Dataset (DFDC)
Model(s)
Graph-Regularized Attentive Convolutional Entanglement (GRACE) using a 53-layer Cross Stage Partial Network (CSPNet) as the backbone and a Graph Convolutional Network (GCN)
Author countries
Taiwan