XAI-Based Detection of Adversarial Attacks on Deepfake Detectors

Authors: Ben Pinhasov, Raz Lapid, Rony Ohayon, Moshe Sipper, Yehudit Aperstein

Published: 2024-03-05 13:25:30+00:00

Comment: Accepted at TMLR 2024

Journal Ref: Transactions on Machine Learning Research, 2024

AI Summary

This paper introduces a novel XAI-based methodology to detect adversarial attacks on deepfake detectors. The approach leverages interpretability maps generated by XAI techniques, combining them with the original input image, to train a classifier that identifies the presence of adversarial perturbations. This method enhances the robustness of deepfake detection systems without altering their core performance, providing insights into potential vulnerabilities.

Abstract

We introduce a novel methodology for identifying adversarial attacks on deepfake detectors using eXplainable Artificial Intelligence (XAI). In an era characterized by digital advancement, deepfakes have emerged as a potent tool, creating a demand for efficient detection systems. However, these systems are frequently targeted by adversarial attacks that inhibit their performance. We address this gap, developing a defensible deepfake detector by leveraging the power of XAI. The proposed methodology uses XAI to generate interpretability maps for a given method, providing explicit visualizations of decision-making factors within the AI models. We subsequently employ a pretrained feature extractor that processes both the input image and its corresponding XAI image. The feature embeddings extracted from this process are then used for training a simple yet effective classifier. Our approach contributes not only to the detection of deepfakes but also enhances the understanding of possible adversarial attacks, pinpointing potential vulnerabilities. Furthermore, this approach does not change the performance of the deepfake detector. The paper demonstrates promising results suggesting a potential pathway for future deepfake detection mechanisms. We believe this study will serve as a valuable contribution to the community, sparking much-needed discourse on safeguarding deepfake detectors.


Key findings
The XAI-based approach effectively detects adversarial attacks on visual deepfake detectors, with Saliency and Guided Backpropagation generally yielding the highest accuracy, especially when the full model is finetuned. The method shows promising generalization across various adversarial attack types (PGD, FGSM, APGD, NES, Square) and different deepfake detector backbones (XceptionNet, EfficientNetB4ST). While XAI integration introduces computational overhead, a balance between explainability and performance can be achieved depending on the chosen XAI technique.
Approach
The proposed method generates interpretability maps using XAI techniques (e.g., Saliency, Guided Backpropagation) for a deepfake detector's decision on a face crop. A pretrained ResNet50 model then processes both the original face crop and its corresponding XAI map to extract feature embeddings. These embeddings are subsequently used to train a simple two-layer linear classifier (Detect-ResNet50) to determine if the input has been subjected to an adversarial attack.
Datasets
FaceForensics++ (FF++)
Model(s)
UNKNOWN
Author countries
Israel