Reduced Spatial Dependency for More General Video-level Deepfake Detection
Authors: Beilin Chu, Xuan Xu, Yufei Zhang, Weike You, Linna Zhou
Published: 2025-03-05 08:51:55+00:00
AI Summary
This paper proposes Spatial Dependency Reduction (SDR) for video deepfake detection, aiming to improve generalization by reducing the model's reliance on spatial information and focusing on temporal consistency cues. SDR integrates features from spatially-perturbed video clusters using a novel Task-Relevant Feature Integration module and a temporal transformer to capture long-range dependencies.
Abstract
As one of the prominent AI-generated content, Deepfake has raised significant safety concerns. Although it has been demonstrated that temporal consistency cues offer better generalization capability, existing methods based on CNNs inevitably introduce spatial bias, which hinders the extraction of intrinsic temporal features. To address this issue, we propose a novel method called Spatial Dependency Reduction (SDR), which integrates common temporal consistency features from multiple spatially-perturbed clusters, to reduce the dependency of the model on spatial information. Specifically, we design multiple Spatial Perturbation Branch (SPB) to construct spatially-perturbed feature clusters. Subsequently, we utilize the theory of mutual information and propose a Task-Relevant Feature Integration (TRFI) module to capture temporal features residing in similar latent space from these clusters. Finally, the integrated feature is fed into a temporal transformer to capture long-range dependencies. Extensive benchmarks and ablation studies demonstrate the effectiveness and rationale of our approach.