Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture

View on arXiv ← Back to list

Authors: Chenqi Kong, Anwei Luo, Peijun Bao, Haoliang Li, Renjie Wan, Zengwei Zheng, Anderson Rocha, Alex C. Kot

Published: 2024-08-23 01:53:36+00:00

AI Summary

This paper proposes a parameter-efficient method for open-set deepfake detection that addresses the limitations of existing models in generalizing across unknown forgery domains and adapting to new data. It achieves this by introducing a forgery-style mixture formulation and a ViT-based model with lightweight modules optimized during training, preserving pre-trained knowledge.

Abstract

Open-set face forgery detection poses significant security threats and presents substantial challenges for existing detection models. These detectors primarily have two limitations: they cannot generalize across unknown forgery domains and inefficiently adapt to new data. To address these issues, we introduce an approach that is both general and parameter-efficient for face forgery detection. It builds on the assumption that different forgery source domains exhibit distinct style statistics. Previous methods typically require fully fine-tuning pre-trained networks, consuming substantial time and computational resources. In turn, we design a forgery-style mixture formulation that augments the diversity of forgery source domains, enhancing the model's generalizability across unseen domains. Drawing on recent advancements in vision transformers (ViT) for face forgery detection, we develop a parameter-efficient ViT-based detection model that includes lightweight forgery feature extraction modules and enables the model to extract global and local forgery clues simultaneously. We only optimize the inserted lightweight modules during training, maintaining the original ViT structure with its pre-trained ImageNet weights. This training strategy effectively preserves the informative pre-trained knowledge while flexibly adapting the model to the task of Deepfake detection. Extensive experimental results demonstrate that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters, representing an important step toward open-set Deepfake detection in the wild.

Key findings

The proposed model achieves state-of-the-art generalizability and robustness with significantly reduced trainable parameters (1.34M). It outperforms existing methods in cross-manipulation and cross-dataset evaluations, demonstrating its effectiveness in open-set deepfake detection. The model also shows superior robustness to common image/video perturbations.

Approach

The approach uses a Vision Transformer (ViT) as a backbone, incorporating lightweight Adapter and LoRA layers for parameter-efficient learning. A forgery-style mixture module augments the training data by mixing feature statistics from different forgery styles to improve generalization.

Datasets

FaceForensics++ (FF++) for training; CelebDF-v2, WildDeepfake, DeepFake Detection Challenge (DFDC), DeepFake Detection Challenge Preview (DFDC-P), DeepForensics-1.0, and Face Forensics in the Wild (FFIW) for testing.

Model(s)

Vision Transformer (ViT) with lightweight Adapter and LoRA layers.

Author countries

Singapore, China, Hong Kong SAR, Brazil

← Previous