WMamba: Wavelet-based Mamba for Face Forgery Detection

Authors: Siran Peng, Tianshuo Zhang, Li Gao, Xiangyu Zhu, Haoyuan Zhang, Kai Pang, Zhen Lei

Published: 2025-01-16 15:44:24+00:00

AI Summary

WMamba is a novel wavelet-based face forgery detection method that utilizes Dynamic Contour Convolution (DCConv) to model slender facial contours and the Mamba architecture to efficiently capture long-range spatial relationships. This approach achieves state-of-the-art performance in face forgery detection.

Abstract

With the rapid advancement of deepfake generation technologies, the demand for robust and accurate face forgery detection algorithms has become increasingly critical. Recent studies have demonstrated that wavelet analysis can uncover subtle forgery artifacts that remain imperceptible in the spatial domain. Wavelets effectively capture important facial contours, which are often slender, fine-grained, and global in nature. However, existing wavelet-based approaches fail to fully leverage these unique characteristics, resulting in sub-optimal feature extraction and limited generalizability. To address this challenge, we introduce WMamba, a novel wavelet-based feature extractor built upon the Mamba architecture. WMamba maximizes the utility of wavelet information through two key innovations. First, we propose Dynamic Contour Convolution (DCConv), which employs specially crafted deformable kernels to adaptively model slender facial contours. Second, by leveraging the Mamba architecture, our method captures long-range spatial relationships with linear computational complexity. This efficiency allows for the extraction of fine-grained, global forgery artifacts from small image patches. Extensive experimental results show that WMamba achieves state-of-the-art (SOTA) performance, highlighting its effectiveness and superiority in face forgery detection.


Key findings
WMamba achieves state-of-the-art performance on multiple datasets, demonstrating strong generalization across different deepfake methods and datasets. Ablation studies confirm the effectiveness of both DCConv and the Mamba architecture in improving detection accuracy.
Approach
WMamba leverages wavelet analysis to extract features, employing DCConv to model slender facial contours and the Mamba architecture for efficient long-range dependency capture. It integrates these features to classify images as real or fake.
Datasets
FaceForensics++ (FF++) for training; Celeb-DeepFake-v2 (CDF), DeepFake Detection Challenge (DFDC), DFDC Preview (DFDCP), and FFIW-10K (FFIW) for testing.
Model(s)
WMamba, which combines Dynamic Contour Convolution (DCConv) and the Mamba architecture. VMamba-S is used as the backbone network.
Author countries
China