Adversarial Attacks on Audio Deepfake Detection: A Benchmark and Comparative Study
Authors: Kutub Uddin, Muhammad Umar Farooq, Awais Khan, Khalid Mahmood Malik
Published: 2025-09-08 18:33:24+00:00
AI Summary
This research paper presents a comparative benchmark study of state-of-the-art audio deepfake detection (ADD) methods under various anti-forensic (AF) attacks. The main contribution is a large-scale evaluation of twelve ADD methods across five datasets and two AF attack categories (statistical and optimization-based), revealing their vulnerabilities and informing the design of more robust detectors.
Abstract
The widespread use of generative AI has shown remarkable success in producing highly realistic deepfakes, posing a serious threat to various voice biometric applications, including speaker verification, voice biometrics, audio conferencing, and criminal investigations. To counteract this, several state-of-the-art (SoTA) audio deepfake detection (ADD) methods have been proposed to identify generative AI signatures to distinguish between real and deepfake audio. However, the effectiveness of these methods is severely undermined by anti-forensic (AF) attacks that conceal generative signatures. These AF attacks span a wide range of techniques, including statistical modifications (e.g., pitch shifting, filtering, noise addition, and quantization) and optimization-based attacks (e.g., FGSM, PGD, C \\& W, and DeepFool). In this paper, we investigate the SoTA ADD methods and provide a comparative analysis to highlight their effectiveness in exposing deepfake signatures, as well as their vulnerabilities under adversarial conditions. We conducted an extensive evaluation of ADD methods on five deepfake benchmark datasets using two categories: raw and spectrogram-based approaches. This comparative analysis enables a deeper understanding of the strengths and limitations of SoTA ADD methods against diverse AF attacks. It does not only highlight vulnerabilities of ADD methods, but also informs the design of more robust and generalized detectors for real-world voice biometrics. It will further guide future research in developing adaptive defense strategies that can effectively counter evolving AF techniques.