Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection

View on arXiv ← Back to list

Authors: Konstantinos Tsigos, Evlampios Apostolidis, Spyridon Baxevanakis, Symeon Papadopoulos, Vasileios Mezaris

Published: 2024-04-29 12:32:14+00:00

AI Summary

This paper introduces a novel framework for quantitatively evaluating explainable AI (XAI) methods in deepfake detection. The framework assesses XAI methods by measuring how effectively adversarial attacks, targeted at the regions identified as most influential by the XAI method, can reduce the deepfake detector's accuracy. This allows for a comparative study of XAI methods' ability to pinpoint critical regions for deepfake detection decisions.

Abstract

In this paper we propose a new framework for evaluating the performance of explanation methods on the decisions of a deepfake detector. This framework assesses the ability of an explanation method to spot the regions of a fake image with the biggest influence on the decision of the deepfake detector, by examining the extent to which these regions can be modified through a set of adversarial attacks, in order to flip the detector's prediction or reduce its initial prediction; we anticipate a larger drop in deepfake detection accuracy and prediction, for methods that spot these regions more accurately. Based on this framework, we conduct a comparative study using a state-of-the-art model for deepfake detection that has been trained on the FaceForensics++ dataset, and five explanation methods from the literature. The findings of our quantitative and qualitative evaluations document the advanced performance of the LIME explanation method against the other compared ones, and indicate this method as the most appropriate for explaining the decisions of the utilized deepfake detector.

Key findings

LIME consistently outperformed other explanation methods (Grad-CAM++, RISE, SHAP, SOBOL) in identifying influential regions for deepfake detection, resulting in the largest decrease in detection accuracy after adversarial attacks. The proposed framework, focusing on explanations for correctly classified fake images, yielded different results compared to existing frameworks, highlighting the importance of context in XAI evaluation.

Approach

The authors propose a framework that evaluates explanation methods by targeting the regions identified as most influential by the method with adversarial attacks. The effectiveness of an explanation method is then measured by the drop in the deepfake detection accuracy after the attacks. This is done on correctly classified fake images.

Datasets

FaceForensics++

Model(s)

EfficientNet-B7

Author countries

Greece

← Previous