Facial Forgery-based Deepfake Detection using Fine-Grained Features

View on arXiv ← Back to list

Authors: Aakash Varma Nadimpalli, Ajita Rattani

Published: 2023-10-10 21:30:05+00:00

AI Summary

This paper proposes a novel fine-grained approach for deepfake detection that improves generalization across datasets and manipulation techniques. The method focuses on learning subtle, local features by suppressing background noise and fusing features at various scales, outperforming existing methods in cross-dataset and cross-manipulation scenarios.

Abstract

Facial forgery by deepfakes has caused major security risks and raised severe societal concerns. As a countermeasure, a number of deepfake detection methods have been proposed. Most of them model deepfake detection as a binary classification problem using a backbone convolutional neural network (CNN) architecture pretrained for the task. These CNN-based methods have demonstrated very high efficacy in deepfake detection with the Area under the Curve (AUC) as high as $0.99$. However, the performance of these methods degrades significantly when evaluated across datasets and deepfake manipulation techniques. This draws our attention towards learning more subtle, local, and discriminative features for deepfake detection. In this paper, we formulate deepfake detection as a fine-grained classification problem and propose a new fine-grained solution to it. Specifically, our method is based on learning subtle and generalizable features by effectively suppressing background noise and learning discriminative features at various scales for deepfake detection. Through extensive experimental validation, we demonstrate the superiority of our method over the published research in cross-dataset and cross-manipulation generalization of deepfake detectors for the majority of the experimental scenarios.

Key findings

The proposed fine-grained approach significantly outperforms eight baseline models in cross-dataset and cross-manipulation generalization. The method achieves state-of-the-art performance on several datasets, demonstrating the effectiveness of learning subtle, local features for robust deepfake detection. Ablation studies confirm the importance of background suppression and high-temperature refinement modules.

Approach

The authors formulate deepfake detection as a fine-grained classification problem. Their method uses a backbone network (Swin Transformer-L and EfficientNet-B4), followed by top-down and bottom-up feature fusion modules, background suppression, and high-temperature refinement to learn subtle and generalizable features.

Datasets

FaceForensics++ (c23 version), Celeb-DF, DFDC

Model(s)

Swin Transformer-L, EfficientNet-B4, Hybrid model (concatenation of Swin Transformer-L and EfficientNet-B4 features), ResNet-50, XceptionNet, MesoInceptionNet-4, CNN-LSTM, VIT, Swin-B, LIT V2-B

Author countries

USA

← Previous