Multiple Contexts and Frequencies Aggregation Network forDeepfake Detection

Authors: Zifeng Li, Wenzhong Tang, Shijun Gao, Shuai Wang, Yanxiang Wang

Published: 2024-08-03 05:34:53+00:00

AI Summary

This paper proposes MkfaNet, an efficient network for face forgery detection that leverages both spatial and frequency domain information. MkfaNet uses a Multi-Kernel Aggregator to capture subtle facial differences and a Multi-Frequency Aggregator to process different frequency bands, improving deepfake detection accuracy and robustness.

Abstract

Deepfake detection faces increasing challenges since the fast growth of generative models in developing massive and diverse Deepfake technologies. Recent advances rely on introducing heuristic features from spatial or frequency domains rather than modeling general forgery features within backbones. To address this issue, we turn to the backbone design with two intuitive priors from spatial and frequency detectors, textit{i.e.,} learning robust spatial attributes and frequency distributions that are discriminative for real and fake samples. To this end, we propose an efficient network for face forgery detection named MkfaNet, which consists of two core modules. For spatial contexts, we design a Multi-Kernel Aggregator that adaptively selects organ features extracted by multiple convolutions for modeling subtle facial differences between real and fake faces. For the frequency components, we propose a Multi-Frequency Aggregator to process different bands of frequency components by adaptively reweighing high-frequency and low-frequency features. Comprehensive experiments on seven popular deepfake detection benchmarks demonstrate that our proposed MkfaNet variants achieve superior performances in both within-domain and across-domain evaluations with impressive efficiency of parameter usage.


Key findings
MkfaNet variants achieved superior performance in both within-domain and cross-domain evaluations across seven deepfake detection benchmarks. MkfaNet demonstrated impressive efficiency in parameter usage compared to other models. Visualizations confirmed that MkfaNet effectively learns discriminative features and accurately localizes forgery artifacts.
Approach
MkfaNet integrates a Multi-Kernel Aggregator (MKA) to adaptively select organ features from multiple convolutions and a Multi-Frequency Aggregator (MFA) to process high and low-frequency components. This combined approach models subtle spatial differences and frequency distributions to discriminate between real and fake faces.
Datasets
FaceForensics++ (FF++, including FF-DF, FF-F2F, FF-FS, FF-NT), CelebDF-v1, CelebDF-v2, DeepFakeDetection (DFD), DeepFake Detection Challenge Preview (DFDC-P), DeepFake Detection Challenge (DFDC), DeeperForensics-1.0 (DF-1.0)
Model(s)
MkfaNet (variants: MkfaNet-T, MkfaNet-S); comparisons also made against ResNet, EfficientNet, Xception, Swin Transformer, and ConvNeXt.
Author countries
China