Explainable deepfake and spoofing detection: an attack analysis using SHapley Additive exPlanations

Authors: Wanying Ge, Massimiliano Todisco, Nicholas Evans

Published: 2022-02-28 11:22:05+00:00

AI Summary

This paper extends previous work on explainable deepfake and spoofing detection by applying SHapley Additive exPlanations (SHAP) to analyze different attack algorithms. Using classifiers operating on raw waveforms and magnitude spectrograms, it identifies attack-specific artifacts and reveals differences and consistencies between synthetic speech and converted voice spoofing attacks.

Abstract

Despite several years of research in deepfake and spoofing detection for automatic speaker verification, little is known about the artefacts that classifiers use to distinguish between bona fide and spoofed utterances. An understanding of these is crucial to the design of trustworthy, explainable solutions. In this paper we report an extension of our previous work to better understand classifier behaviour to the use of SHapley Additive exPlanations (SHAP) to attack analysis. Our goal is to identify the artefacts that characterise utterances generated by different attacks algorithms. Using a pair of classifiers which operate either upon raw waveforms or magnitude spectrograms, we show that visualisations of SHAP results can be used to identify attack-specific artefacts and the differences and consistencies between synthetic speech and converted voice spoofing attacks.


Key findings
Analysis revealed attack-specific artifacts, particularly in vowel segments for three neural network-based text-to-speech attacks. Consistencies were observed across attack classes, especially in low-frequency bands, suggesting potential for generalization in spoofing detection. The use of different classifiers (operating on waveforms and spectrograms) highlighted the importance of considering multiple representations for comprehensive artifact identification.
Approach
The researchers used SHAP values to analyze the influence of individual input features (waveform samples or spectrograms) on the output of two classifiers trained on the ASVspoof 2019 Logical Access database. Visualizations of SHAP values helped identify attack-specific artifacts characterizing different spoofing algorithms.
Datasets
ASVspoof 2019 Logical Access (LA) database
Model(s)
1D-Res-TSSDNet and 2D-Res-TSSDNet
Author countries
France