Patch-Discontinuity Mining for Generalized Deepfake Detection

Authors: Huanhuan Yuan, Yang Ping, Zhengqin Xu, Junyi Cao, Shuai Jia, Chao Ma

Published: 2025-12-26 13:18:14+00:00

Comment: Our paper was accepted by the IEEE Transactions on Multimedia

AI Summary

This paper introduces GenDF, a simple yet effective framework that transfers a powerful large-scale vision model (ViT) to the deepfake detection task with a compact network design. GenDF incorporates deepfake-specific representation learning to capture discriminative patterns, feature space redistribution to mitigate distribution mismatch, and a classification-invariant feature augmentation strategy to enhance generalization. It achieves state-of-the-art generalization performance in cross-domain and cross-manipulation settings with only 0.28M trainable parameters.

Abstract

The rapid advancement of generative artificial intelligence has enabled the creation of highly realistic fake facial images, posing serious threats to personal privacy and the integrity of online information. Existing deepfake detection methods often rely on handcrafted forensic cues and complex architectures, achieving strong performance in intra-domain settings but suffering significant degradation when confronted with unseen forgery patterns. In this paper, we propose GenDF, a simple yet effective framework that transfers a powerful large-scale vision model to the deepfake detection task with a compact and neat network design. GenDF incorporates deepfake-specific representation learning to capture discriminative patterns between real and fake facial images, feature space redistribution to mitigate distribution mismatch, and a classification-invariant feature augmentation strategy to enhance generalization without introducing additional trainable parameters. Extensive experiments demonstrate that GenDF achieves state-of-the-art generalization performance in cross-domain and cross-manipulation settings while requiring only 0.28M trainable parameters, validating the effectiveness and efficiency of the proposed framework.


Key findings
GenDF achieves state-of-the-art generalization performance across cross-domain and cross-manipulation settings, significantly outperforming existing methods. It demonstrates high efficiency with only 0.28M trainable parameters and exhibits superior robustness against various image quality perturbations. The method effectively learns discriminative patch-discontinuity patterns for fake faces and continuity for real faces.
Approach
GenDF fine-tunes a Vision Transformer (ViT) using a Deepfake-Specific Representations Learning (DSRL) scheme with Low-Rank Adaptation (LoRA) to capture patch-discontinuity patterns in fake faces and continuity in real faces. It then employs Feature Space Redistribution (FSR) to separately optimize real and fake feature distributions, increasing inter-class distance. Finally, a Class-Invariant Feature Augmentation (CIFAug) function expands feature space along class-invariant directions to enhance generalization without adding trainable parameters.
Datasets
FaceForensics++ (FF++), Celeb-DF, Deepfake Detection Challenge (DFDC), DeepfakeDetection (DFD)
Model(s)
Vision Transformer (ViT-B/16) backbone, Low-Rank Adaptation (LoRA)
Author countries
China