Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated Samples

Authors: Konstantinos Tsigos, Evlampios Apostolidis, Vasileios Mezaris

Published: 2025-02-06 10:47:34+00:00

AI Summary

This paper proposes a novel perturbation approach for improving the explanation of deepfake detectors using adversarially-generated samples. These samples, generated via Natural Evolution Strategies, aim to flip the deepfake detector's decision, leading to more accurate visual explanations of manipulated image regions.

Abstract

In this paper, we introduce the idea of using adversarially-generated samples of the input images that were classified as deepfakes by a detector, to form perturbation masks for inferring the importance of different input features and produce visual explanations. We generate these samples based on Natural Evolution Strategies, aiming to flip the original deepfake detector's decision and classify these samples as real. We apply this idea to four perturbation-based explanation methods (LIME, SHAP, SOBOL and RISE) and evaluate the performance of the resulting modified methods using a SOTA deepfake detection model, a benchmarking dataset (FaceForensics++) and a corresponding explanation evaluation framework. Our quantitative assessments document the mostly positive contribution of the proposed perturbation approach in the performance of explanation methods. Our qualitative analysis shows the capacity of the modified explanation methods to demarcate the manipulated image regions more accurately, and thus to provide more useful explanations.


Key findings
Quantitative and qualitative evaluations demonstrate that the proposed approach significantly improves the accuracy of explanation methods in identifying manipulated regions in deepfake images. The modified LIME method shows the best performance. The increased computational cost is deemed acceptable given the performance gains.
Approach
The authors use adversarially-generated samples of deepfake images, created using Natural Evolution Strategies to fool the detector. These samples are then used to create perturbation masks for four existing explanation methods (LIME, SHAP, SOBOL, RISE), improving their ability to identify manipulated regions.
Datasets
FaceForensics++
Model(s)
EfficientNet-based deepfake detection model (pretrained model from [36])
Author countries
Greece