DeePen: Penetration Testing for Audio Deepfake Detection

Authors: Nicolas Müller, Piotr Kawa, Adriana Stan, Thien-Phuc Doan, Souhwan Jung, Wei Herng Choong, Philip Sperl, Konstantin Böttinger

Published: 2025-02-27 12:26:25+00:00

AI Summary

This paper introduces DeePen, a penetration testing methodology for evaluating the robustness of audio deepfake detection models. DeePen uses signal processing modifications (attacks) to assess model vulnerabilities, revealing that all tested systems, both commercial and academic, are susceptible to deception by simple manipulations.

Abstract

Deepfakes - manipulated or forged audio and video media - pose significant security risks to individuals, organizations, and society at large. To address these challenges, machine learning-based classifiers are commonly employed to detect deepfake content. In this paper, we assess the robustness of such classifiers through a systematic penetration testing methodology, which we introduce as DeePen. Our approach operates without prior knowledge of or access to the target deepfake detection models. Instead, it leverages a set of carefully selected signal processing modifications - referred to as attacks - to evaluate model vulnerabilities. Using DeePen, we analyze both real-world production systems and publicly available academic model checkpoints, demonstrating that all tested systems exhibit weaknesses and can be reliably deceived by simple manipulations such as time-stretching or echo addition. Furthermore, our findings reveal that while some attacks can be mitigated by retraining detection systems with knowledge of the specific attack, others remain persistently effective. We release all associated code.


Key findings
All tested deepfake detection systems exhibited vulnerabilities and could be reliably deceived by simple manipulations like time-stretching or echo addition. While retraining with knowledge of specific attacks mitigated some effects, others remained persistently effective. A minimal set of adaptive augmentations proved sufficient to achieve comparable performance to retraining on all attacks.
Approach
DeePen systematically applies 17 signal processing modifications (attacks) to audio samples from existing deepfake datasets. The modified samples are used to evaluate the robustness of deepfake detection models, both with and without adaptive retraining on the attacks.
Datasets
ASVspoof 2019, Multi-Language Audio Anti-Spoofing Dataset (MLAAD)
Model(s)
Raw PC-DARTS, LCNN, RawGAT-ST, RawNet2, WhisperDF, W2V2, four undisclosed commercial systems
Author countries
Germany, Poland, Romania, South Korea