MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake Detection

Authors: Aayushi Agarwal, Akshay Agarwal, Sayan Sinha, Mayank Vatsa, Richa Singh

Published: 2021-09-15 14:11:53+00:00

AI Summary

The paper proposes MD-CSDNetwork, a cross-stitched network that combines spatial and frequency domain features for deepfake detection. This multi-domain approach improves performance and generalization by treating spatial and frequency features as related supervisory signals, learning an optimal combination of domain-specific and shared representations.

Abstract

The rapid progress in the ease of creating and spreading ultra-realistic media over social platforms calls for an urgent need to develop a generalizable deepfake detection technique. It has been observed that current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos. Inspired by this observation, in this paper, we present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation for classifying textit{deepfakes}. MD-CSDNetwork is a novel cross-stitched network with two parallel branches carrying the spatial and frequency information, respectively. We hypothesize that these multi-domain input data streams can be considered as related supervisory signals. The supervision from both branches ensures better performance and generalization. Further, the concept of cross-stitch connections is utilized where they are inserted between the two branches to learn an optimal combination of domain-specific and shared representations from other domains automatically. Extensive experiments are conducted on the popular benchmark dataset namely FaceForeniscs++ for forgery classification. We report improvements over all the manipulation types in FaceForensics++ dataset and comparable results with state-of-the-art methods for cross-database evaluation on the Celeb-DF dataset and the Deepfake Detection Dataset.


Key findings
MD-CSDNetwork shows significant improvements over a baseline XceptionNet and comparable or better results than state-of-the-art methods on FaceForensics++. It also demonstrates good generalization ability across different datasets (Celeb-DF and Deepfake Detection Dataset). The use of DCT for frequency feature extraction proves superior to DWT and FFT.
Approach
MD-CSDNetwork uses two parallel branches, one processing spatial information and the other processing frequency information (DCT spectrum) from the same image. Cross-stitch units connect these branches at multiple layers, allowing for optimal combination of domain-specific and shared representations, improving both performance and generalization.
Datasets
FaceForensics++, Celeb-DF, Deepfake Detection Dataset
Model(s)
XceptionNet as a backbone for both spatial and frequency branches, with added cross-stitch units.
Author countries
India, USA