SpecXNet: A Dual-Domain Convolutional Network for Robust Deepfake Detection

View on arXiv ← Back to list

Authors: Inzamamul Alam, Md Tanvir Islam, Simon S. Woo

Published: 2025-09-26 08:51:59+00:00

AI Summary

The paper introduces SpecXNet, a Spectral Cross-Attentional Network, designed for robust deepfake detection using a dual-domain architecture. This network leverages a Dual-Domain Feature Coupler (DDFC) to decompose features into local spatial and global spectral branches. SpecXNet achieves state-of-the-art accuracy and strong generalization across diverse unseen manipulations and post-processing scenarios by dynamically fusing these features using Dual Fourier Attention (DFA).

Abstract

The increasing realism of content generated by GANs and diffusion models has made deepfake detection significantly more challenging. Existing approaches often focus solely on spatial or frequency-domain features, limiting their generalization to unseen manipulations. We propose the Spectral Cross-Attentional Network (SpecXNet), a dual-domain architecture for robust deepfake detection. The core \\textbf{Dual-Domain Feature Coupler (DDFC)} decomposes features into a local spatial branch for capturing texture-level anomalies and a global spectral branch that employs Fast Fourier Transform to model periodic inconsistencies. This dual-domain formulation allows SpecXNet to jointly exploit localized detail and global structural coherence, which are critical for distinguishing authentic from manipulated images. We also introduce the \\textbf{Dual Fourier Attention (DFA)} module, which dynamically fuses spatial and spectral features in a content-aware manner. Built atop a modified XceptionNet backbone, we embed the DDFC and DFA modules within a separable convolution block. Extensive experiments on multiple deepfake benchmarks show that SpecXNet achieves state-of-the-art accuracy, particularly under cross-dataset and unseen manipulation scenarios, while maintaining real-time feasibility. Our results highlight the effectiveness of unified spatial-spectral learning for robust and generalizable deepfake detection. To ensure reproducibility, we released the full code on \\href{https://github.com/inzamamulDU/SpecXNet}{\\textcolor{blue}{\\textbf{GitHub}}}.

Key findings

SpecXNet achieves SOTA results, peaking at 96.4% average accuracy on deepfake benchmarks, demonstrating superior performance compared to existing spatial and spectrum-based methods. The dual-domain approach provides excellent cross-dataset generalization, scoring 90.0% average accuracy on the practical TGen benchmark, and maintains high robustness against common post-processing artifacts. The architecture is also computationally efficient, showing high throughput suitable for real-time application.

Approach

SpecXNet enhances the XceptionNet backbone by incorporating a Dual-Domain Feature Coupler (DDFC) within its separable convolution blocks. The DDFC splits input features into a local spatial branch (using standard convolutions) and a global spectral branch (using Fast Fourier Transform to capture periodic anomalies). These features are then adaptively merged using the Dual Fourier Attention (DFA) module, which performs content-aware cross-domain modulation and weighted fusion.

Datasets

ImageNet, MS COCO, LSUN, Danbooru & Artist, ProGAN, StyleGAN2, StyleGAN3, BigGAN, EG3D, Taming Transformers, GLIDE, Stable Diffusion V1.4, Latent Diffusion, DALL-E 2, Guided Diffusion, SDXL, DiffusionDB, DreamBooth, Midjourney V4/V5, NightCafe, StableAI, YiJian, GenImage, FaceForensics++ (FF++).

Model(s)

SpecXNet (Modified XceptionNet backbone with Dual-Domain Feature Coupler (DDFC) and Dual Fourier Attention (DFA)).

Author countries

Republic of Korea

← Previous