Deepfake Detection with Multi-Artifact Subspace Fine-Tuning and Selective Layer Masking

Authors: Xiang Zhang, Wenliang Weng, Daoyong Fu, Ziqiang Li, Zhangjie Fu

Published: 2026-01-03 02:33:18+00:00

AI Summary

This paper proposes a deepfake detection method called Multi-Artifact Subspaces and selective layer masks (MASM) to improve generalization robustness in cross-dataset scenarios. MASM decouples semantic and artifact representations by applying singular value decomposition to model weights, partitioning them into a stable semantic subspace and multiple learnable artifact subspaces. A selective layer mask strategy adaptively regulates layer updates to prevent overfitting, enhanced by orthogonality and spectral consistency constraints.

Abstract

Deepfake detection still faces significant challenges in cross-dataset and real-world complex scenarios. The root cause lies in the high diversity of artifact distributions introduced by different forgery methods, while pretrained models tend to disrupt their original general semantic structures when adapting to new artifacts. Existing approaches usually rely on indiscriminate global parameter updates or introduce additional supervision signals, making it difficult to effectively model diverse forgery artifacts while preserving semantic stability. To address these issues, this paper proposes a deepfake detection method based on Multi-Artifact Subspaces and selective layer masks (MASM), which explicitly decouples semantic representations from artifact representations and constrains the fitting strength of artifact subspaces, thereby improving generalization robustness in cross-dataset scenarios. Specifically, MASM applies singular value decomposition to model weights, partitioning pretrained weights into a stable semantic principal subspace and multiple learnable artifact subspaces. This design enables decoupled modeling of different forgery artifact patterns while preserving the general semantic subspace. On this basis, a selective layer mask strategy is introduced to adaptively regulate the update behavior of corresponding network layers according to the learning state of each artifact subspace, suppressing overfitting to any single forgery characteristic. Furthermore, orthogonality constraints and spectral consistency constraints are imposed to jointly regularize multiple artifact subspaces, guiding them to learn complementary and diverse artifact representations while maintaining a stable overall spectral structure.


Key findings
MASM achieved superior generalization performance on various cross-dataset deepfake detection benchmarks (including FF++, CDF, DFDC-P, DFDC, DFD) for both frame-level and video-level AUC, consistently outperforming existing state-of-the-art methods. The method also demonstrated strong robustness against diverse real-world distortions like Gaussian blur, JPEG compression, and noise. Ablation studies confirmed the critical contributions of both the multi-artifact subspace fine-tuning and selective layer masking components, along with the proposed regularization constraints, to the model's overall effectiveness.
Approach
The MASM method explicitly decouples semantic and artifact representations within a pretrained model's weights using Singular Value Decomposition (SVD), creating a frozen semantic principal subspace and multiple learnable artifact subspaces. It incorporates a selective layer mask (SLM) mechanism that adaptively controls parameter updates in specific network layers based on their bias-variance ratio, thereby preventing overfitting. The approach further uses orthogonality and spectral consistency constraints to regularize the artifact subspaces and maintain overall spectral structure.
Datasets
FaceForensics++ (FF++), CelebDF (CDF), DFDC Preview (DFDC-P), Deepfake Detection Challenge dataset (DFDC), DeepfakeDetection (DFD)
Model(s)
CLIP ViT-L/14
Author countries
China