Deepfake Detection via Joint Unsupervised Reconstruction and Supervised Classification

Authors: Bosheng Yan, Chang-Tsun Li, Xuequan Lu

Published: 2022-11-24 05:44:26+00:00

AI Summary

This paper proposes a novel deepfake detection method using a two-branch convolutional autoencoder that jointly performs unsupervised reconstruction and supervised classification. This approach improves generalization to unseen manipulation methods by leveraging both local and global spatial information learned from both tasks, achieving state-of-the-art performance in cross-dataset evaluation.

Abstract

Deep learning has enabled realistic face manipulation (i.e., deepfake), which poses significant concerns over the integrity of the media in circulation. Most existing deep learning techniques for deepfake detection can achieve promising performance in the intra-dataset evaluation setting (i.e., training and testing on the same dataset), but are unable to perform satisfactorily in the inter-dataset evaluation setting (i.e., training on one dataset and testing on another). Most of the previous methods use the backbone network to extract global features for making predictions and only employ binary supervision (i.e., indicating whether the training instances are fake or authentic) to train the network. Classification merely based on the learning of global features leads often leads to weak generalizability to unseen manipulation methods. In addition, the reconstruction task can improve the learned representations. In this paper, we introduce a novel approach for deepfake detection, which considers the reconstruction and classification tasks simultaneously to address these problems. This method shares the information learned by one task with the other, which focuses on a different aspect other existing works rarely consider and hence boosts the overall performance. In particular, we design a two-branch Convolutional AutoEncoder (CAE), in which the Convolutional Encoder used to compress the feature map into the latent representation is shared by both branches. Then the latent representation of the input data is fed to a simple classifier and the unsupervised reconstruction component simultaneously. Our network is trained end-to-end. Experiments demonstrate that our method achieves state-of-the-art performance on three commonly-used datasets, particularly in the cross-dataset evaluation setting.


Key findings
The proposed method achieves state-of-the-art performance on three deepfake datasets, particularly in cross-dataset evaluation. The joint unsupervised reconstruction and supervised classification improves generalization to unseen manipulation techniques. Ablation studies demonstrate the effectiveness of both the unsupervised learning component and data augmentation from other datasets.
Approach
The authors propose a two-branch convolutional autoencoder (CAE) with a shared encoder. The encoder extracts features from input images, which are then fed to a classifier for deepfake detection and a decoder for image reconstruction. The model is trained end-to-end using a combined loss function that considers both classification accuracy and reconstruction error.
Datasets
UADFV, FaceForensics++, Celeb-DF
Model(s)
Two-branch Convolutional AutoEncoder (CAE) with Xception as the backbone for the shared encoder, a linear classifier, and a decoder.
Author countries
Australia