Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning

Authors: Sarwar Khan

Published: 2024-02-06 11:35:05+00:00

AI Summary

This paper proposes Adversarial Feature Similarity Learning (AFSL) for robust deepfake detection. AFSL integrates three deep feature learning paradigms to distinguish real and fake videos, even under adversarial attacks, by optimizing feature similarity and dissimilarity between real and fake samples and their perturbed counterparts.

Abstract

Deepfake technology has raised concerns about the authenticity of digital content, necessitating the development of effective detection methods. However, the widespread availability of deepfakes has given rise to a new challenge in the form of adversarial attacks. Adversaries can manipulate deepfake videos with small, imperceptible perturbations that can deceive the detection models into producing incorrect outputs. To tackle this critical issue, we introduce Adversarial Feature Similarity Learning (AFSL), which integrates three fundamental deep feature learning paradigms. By optimizing the similarity between samples and weight vectors, our approach aims to distinguish between real and fake instances. Additionally, we aim to maximize the similarity between both adversarially perturbed examples and unperturbed examples, regardless of their real or fake nature. Moreover, we introduce a regularization technique that maximizes the dissimilarity between real and fake samples, ensuring a clear separation between these two categories. With extensive experiments on popular deepfake datasets, including FaceForensics++, FaceShifter, and DeeperForensics, the proposed method outperforms other standard adversarial training-based defense methods significantly. This further demonstrates the effectiveness of our approach to protecting deepfake detectors from adversarial attacks.


Key findings
AFSL significantly outperforms existing adversarial training-based defense methods on various deepfake datasets. It demonstrates robust generalization across different deepfake creation methods and common video distortions. The ablation study shows that all three components of the loss function contribute to the improved performance.
Approach
AFSL uses a three-part loss function: deepfake classification loss, adversarial similarity loss (maximizing similarity between original and perturbed samples), and similarity regularization loss (maximizing dissimilarity between real and fake samples). This approach improves robustness against adversarial attacks while maintaining performance on unperturbed data.
Datasets
FaceForensics++, FaceShifter, DeeperForensics
Model(s)
Channel-Separated Convolutional Network (CSN), XceptionNet, MesoNet
Author countries
Taiwan