Towards Robust DeepFake Detection under Unstable Face Sequences: Adaptive Sparse Graph Embedding with Order-Free Representation and Explicit Laplacian Spectral Prior

Authors: Chih-Chung Hsu, Shao-Ning Chen, Chia-Ming Lee, Yi-Fang Wang, Yi-Shiuan Chou

Published: 2025-12-08 12:31:07+00:00

Comment: 16 pages (including appendix)

AI Summary

This paper introduces a Laplacian-Regularized Graph Convolutional Network (LR-GCN) for robust DeepFake detection, specifically designed to handle noisy or unordered face sequences. It employs an Order-Free Temporal Graph Embedding (OF-TGE) to construct an adaptive sparse graph from frame-wise CNN features, bypassing rigid temporal continuity. The method incorporates dual-level sparsity and an explicit Graph Laplacian Spectral Prior to effectively filter noise and highlight forgery artifacts.

Abstract

Ensuring the authenticity of video content remains challenging as DeepFake generation becomes increasingly realistic and robust against detection. Most existing detectors implicitly assume temporally consistent and clean facial sequences, an assumption that rarely holds in real-world scenarios where compression artifacts, occlusions, and adversarial attacks destabilize face detection and often lead to invalid or misdetected faces. To address these challenges, we propose a Laplacian-Regularized Graph Convolutional Network (LR-GCN) that robustly detects DeepFakes from noisy or unordered face sequences, while being trained only on clean facial data. Our method constructs an Order-Free Temporal Graph Embedding (OF-TGE) that organizes frame-wise CNN features into an adaptive sparse graph based on semantic affinities. Unlike traditional methods constrained by strict temporal continuity, OF-TGE captures intrinsic feature consistency across frames, making it resilient to shuffled, missing, or heavily corrupted inputs. We further impose a dual-level sparsity mechanism on both graph structure and node features to suppress the influence of invalid faces. Crucially, we introduce an explicit Graph Laplacian Spectral Prior that acts as a high-pass operator in the graph spectral domain, highlighting structural anomalies and forgery artifacts, which are then consolidated by a low-pass GCN aggregation. This sequential design effectively realizes a task-driven spectral band-pass mechanism that suppresses background information and random noise while preserving manipulation cues. Extensive experiments on FF++, Celeb-DFv2, and DFDC demonstrate that LR-GCN achieves state-of-the-art performance and significantly improved robustness under severe global and local disruptions, including missing faces, occlusions, and adversarially perturbed face detections.


Key findings
LR-GCN achieves state-of-the-art performance and significantly improved robustness under severe global and local disruptions, including missing faces, occlusions, and adversarial attacks, especially with high masking ratios (e.g., mr = 0.8). While other methods suffer substantial performance degradation under such noisy conditions, LR-GCN maintains high detection accuracy and demonstrates strong resilience to domain shifts in cross-dataset evaluations.
Approach
The LR-GCN constructs an adaptive sparse graph using an Order-Free Temporal Graph Embedding (OF-TGE) based on semantic affinities of frame-wise CNN features, making it resilient to corrupted or unordered inputs. It applies dual-level sparsity to both graph structure and node features to suppress invalid faces. An explicit Graph Laplacian Spectral Prior acts as a high-pass filter to highlight forgery artifacts, which are then consolidated by GCN aggregation (low-pass filtering), forming a task-driven spectral band-pass mechanism.
Datasets
FF++ [8], Celeb-DFv2 [9], DFDC [10]
Model(s)
Laplacian-Regularized Graph Convolutional Network (LR-GCN) with a 53-layer Cross Stage Partial Network (CSPNet) [41] as the backbone network.
Author countries
Taiwan