WMamba: Wavelet-based Mamba for Face Forgery Detection

Authors: Siran Peng, Tianshuo Zhang, Li Gao, Xiangyu Zhu, Haoyuan Zhang, Kai Pang, Zhen Lei

Published: 2025-01-16 15:44:24+00:00

Comment: Accepted by ACM MM 2025

AI Summary

This paper introduces WMamba, a novel wavelet-based feature extractor built upon the Mamba architecture for robust face forgery detection. It enhances forgery detection by proposing Dynamic Contour Convolution (DCConv) to adaptively model slender facial contours and leveraging the Mamba architecture to capture long-range spatial relationships with linear complexity. WMamba effectively extracts fine-grained, globally distributed forgery artifacts, achieving state-of-the-art performance.

Abstract

The rapid evolution of deepfake generation technologies necessitates the development of robust face forgery detection algorithms. Recent studies have demonstrated that wavelet analysis can enhance the generalization abilities of forgery detectors. Wavelets effectively capture key facial contours, often slender, fine-grained, and globally distributed, that may conceal subtle forgery artifacts imperceptible in the spatial domain. However, current wavelet-based approaches fail to fully exploit the distinctive properties of wavelet data, resulting in sub-optimal feature extraction and limited performance gains. To address this challenge, we introduce WMamba, a novel wavelet-based feature extractor built upon the Mamba architecture. WMamba maximizes the utility of wavelet information through two key innovations. First, we propose Dynamic Contour Convolution (DCConv), which employs specially crafted deformable kernels to adaptively model slender facial contours. Second, by leveraging the Mamba architecture, our method captures long-range spatial relationships with linear complexity. This efficiency allows for the extraction of fine-grained, globally distributed forgery artifacts from small image patches. Extensive experiments show that WMamba achieves state-of-the-art (SOTA) performance, highlighting its effectiveness in face forgery detection.


Key findings
WMamba achieves state-of-the-art performance in both cross-dataset and cross-manipulation evaluations, demonstrating exceptional generalization capability to unseen data and forgery types. Ablation studies confirm that DCConv significantly improves the capture of slender facial contours, and the Mamba architecture effectively extracts fine-grained, globally distributed forgery clues. The model also shows improved robustness against various real-world image degradations like compression, occlusion, noise, and blur.
Approach
The proposed WMamba architecture utilizes a Hierarchical Wavelet Feature Extraction Branch (HWFEB) with multi-level Discrete Wavelet Transform (DWT) to capture wavelet representations. These are processed by Wavelet Feature Extraction Modules (WFEMs) using Dynamic Contour Convolution (DCConv) with deformable kernels for adaptive modeling of slender facial contours. The extracted features are then integrated into a VMamba model, based on the Mamba architecture, to efficiently capture long-range spatial dependencies and fine-grained forgery artifacts.
Datasets
FaceForensics++ (FF++) for training; Celeb-DeepFake-v2 (CDF), DeepFake Detection Challenge (DFDC), DeepFake Detection Challenge Preview (DFDCP), FFIW-10K (FFIW) for cross-dataset evaluation; Digital Retinal Images for Vessel Extraction (DRIVE) for DCConv versatility assessment.
Model(s)
WMamba (main model, based on VMamba-S backbone and incorporating Dynamic Contour Convolution (DCConv)); VMamba-T, VMamba-S, VMamba-B (Mamba architecture variants); ConvNeXt, ViT, Swin (comparison backbones); U-Net (for DCConv versatility test).
Author countries
China