Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

Authors: Georgia Channing, Juil Sock, Ronald Clark, Philip Torr, Christian Schroeder de Witt

Published: 2024-10-09 21:08:28+00:00

AI Summary

This paper introduces novel explainability methods for transformer-based audio deepfake detectors and open-sources a new benchmark for real-world generalizability. The improved explainability builds trust and addresses the scalability challenge in audio deepfake detection.

Abstract

The rapid proliferation of AI-manipulated or generated audio deepfakes poses serious challenges to media integrity and election security. Current AI-driven detection solutions lack explainability and underperform in real-world settings. In this paper, we introduce novel explainability methods for state-of-the-art transformer-based audio deepfake detectors and open-source a novel benchmark for real-world generalizability. By narrowing the explainability gap between transformer-based audio deepfake detectors and traditional methods, our results not only build trust with human experts, but also pave the way for unlocking the potential of citizen intelligence to overcome the scalability issue in audio deepfake detection.


Key findings
Transformer-based models (AST and Wav2Vec) significantly outperform GBDT on unseen data in the FakeAVCeleb dataset. Attention roll-out proves useful in visualizing model attention, but occlusion results were less informative. The GBDT model's performance degrades significantly with compressed and rerecorded audio.
Approach
The researchers address the explainability gap in transformer-based audio deepfake detection by using attention roll-out to visualize attention weights across layers and occlusion to identify important regions in audio spectrograms. They also introduce a new benchmark using ASVspoof 5 for training and FakeAVCeleb for testing.
Datasets
ASVspoof 5, FakeAVCeleb, Compressed ASVspoof, Rerecorded ASVspoof
Model(s)
Gradient Boosting Decision Trees (GBDT), Audio Spectrogram Transformer (AST), Wav2Vec-based transformer
Author countries
United Kingdom