GM-DF: Generalized Multi-Scenario Deepfake Detection

Authors: Yingxin Lai, Zitong Yu, Jing Yang, Bin Li, Xiangui Kang, Linlin Shen

Published: 2024-06-28 17:42:08+00:00

AI Summary

This paper introduces GM-DF, a generalized multi-scenario deepfake detection framework that addresses the limited generalization capacity of existing methods when encountering unseen scenarios and unknown attacks. GM-DF achieves this by jointly training on multiple datasets using a hybrid expert modeling approach, CLIP for common feature extraction, masked image reconstruction, and a domain-aware meta-learning strategy.

Abstract

Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of detection accuracy when models are directly trained on combined datasets due to the discrepancy across collection scenarios and generation methods. To address the above issue, a Generalized Multi-Scenario Deepfake Detection framework (GM-DF) is proposed to serve multiple real-world scenarios by a unified model. First, we propose a hybrid expert modeling approach for domain-specific real/forgery feature extraction. Besides, as for the commonality representation, we use CLIP to extract the common features for better aligning visual and textual features across domains. Meanwhile, we introduce a masked image reconstruction mechanism to force models to capture rich forged details. Finally, we supervise the models via a domain-aware meta-learning strategy to further enhance their generalization capacities. Specifically, we design a novel domain alignment loss to strongly align the distributions of the meta-test domains and meta-train domains. Thus, the updated models are able to represent both specific and common real/forgery features across multiple datasets. In consideration of the lack of study of multi-dataset training, we establish a new benchmark leveraging multi-source data to fairly evaluate the models' generalization capacity on unseen scenarios. Both qualitative and quantitative experiments on five datasets conducted on traditional protocols as well as the proposed benchmark demonstrate the effectiveness of our approach.


Key findings
GM-DF outperforms existing methods on both traditional protocols and a newly proposed benchmark for multi-dataset deepfake detection. The approach effectively addresses the issue of accuracy degradation when training on combined datasets due to domain discrepancies. The model demonstrates robustness against various image distortions.
Approach
GM-DF uses a hybrid expert modeling approach for domain-specific feature extraction, CLIP to extract common features aligning visual and textual information, and a masked image reconstruction mechanism to capture forged details. It employs a domain-aware meta-learning strategy with a novel domain alignment loss to enhance generalization.
Datasets
FaceForensics++ (FF++), Celeb-DF (V2), WildDeepfake (WDF), DFDC, DeepFakeFace (DFF)
Model(s)
ViT-B/16, CLIP, Mixture of Experts (MoE), RetinaFace
Author countries
China