DYMAPIA: A Multi-Domain Framework for Detecting AI-based Video Manipulation

Authors: Md Shohel Rana, Andrew H. Sung

Published: 2026-04-27 12:53:43+00:00

AI Summary

DYMAPIA is a multi-domain Deepfake detection framework that combines spatial, spectral, and temporal cues to identify subtle manipulation traces in visual data. It generates dynamic anomaly masks from Fourier spectra, texture, edge irregularities, and optical flow, which then guide a lightweight, Xception-distilled classifier called DistXCNet. This framework achieves state-of-the-art accuracy and F1-scores exceeding 99% on FF++, Celeb-DF, and VDFD benchmarks, while being compact enough for real-time deployment.

Abstract

AI-generated media are advancing rapidly, raising pressing concerns for content authenticity and digital trust. We introduce DYMAPIA, a multi-domain Deepfake detection framework that fuses spatial, spectral, and temporal cues to capture subtle traces of manipulation in visual data. The system builds dynamic anomaly masks by combining evidence from Fourier spectra, local texture descriptors, edge irregularities, and optical flow consistency, which highlight tampered regions with fine spatial accuracy. These masks guide DistXCNet, a lightweight classifier distilled from Xception and optimized with depthwise separable convolutions for fast, region-focused classification. This joint design achieves state-of-the-art results, with accuracy and F1-scores exceeding 99\\% on FF++, Celeb-DF, and VDFD benchmarks, while keeping the model compact enough for real-time use. Beyond outperforming existing full-frame and multidomain detectors, DYMAPIA demonstrates deployment readiness for time-critical forensic tasks, including media verification, misinformation defense, and secure content filtering.


Key findings
DYMAPIA with DistXCNet consistently outperforms state-of-the-art baselines across all evaluated datasets, achieving F1-scores of 99.95% on FF++, 99.96% on CBDF, and 99.76% on VDFD. The framework is highly efficient, featuring fewer than 14K parameters and enabling real-time inference, making it suitable for deployment in time-critical forensic tasks.
Approach
DYMAPIA first generates dynamic anomaly masks by fusing evidence from frequency domain analysis (Fourier spectra), spatial domain analysis (Local Binary Patterns for texture, Canny for edges), and temporal consistency analysis (dense optical flow). These pixel-level anomaly masks are then fed into DistXCNet, a lightweight, depthwise separable convolutional neural network distilled from Xception, which performs mask-guided classification to determine if the content is authentic or manipulated.
Datasets
FaceForensics++ (FF++), Celeb-DF (CBDF), Versatile Deepfake Dataset (VDFD)
Model(s)
DistXCNet (distilled from Xception), Mask R-CNN (for face segmentation during preprocessing)
Author countries
USA