Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios

View on arXiv ← Back to list

Authors: Haohan Shi, Xiyu Shi, Safak Dogan, Saif Alzubi, Tianjin Huang, Yunxiao Zhang

Published: 2025-04-16 18:44:05+00:00

AI Summary

This research introduces ADD-C, a new benchmark dataset for evaluating audio deepfake detection (ADD) systems' robustness under real-world communication conditions (codec compression and packet loss). A novel data augmentation strategy is proposed to improve ADD system performance on ADD-C, significantly enhancing robustness against these real-world degradations.

Abstract

Existing Audio Deepfake Detection (ADD) systems often struggle to generalise effectively due to the significantly degraded audio quality caused by audio codec compression and channel transmission effects in real-world communication scenarios. To address this challenge, we developed a rigorous benchmark to evaluate the performance of the ADD system under such scenarios. We introduced ADD-C, a new test dataset to evaluate the robustness of ADD systems under diverse communication conditions, including different combinations of audio codecs for compression and packet loss rates. Benchmarking three baseline ADD models on the ADD-C dataset demonstrated a significant decline in robustness under such conditions. A novel Data Augmentation (DA) strategy was proposed to improve the robustness of ADD systems. Experimental results demonstrated that the proposed approach significantly enhances the performance of ADD systems on the proposed ADD-C dataset. Our benchmark can assist future efforts towards building practical and robustly generalisable ADD systems.

Key findings

Baseline ADD models showed significant performance degradation under simulated real-world communication conditions. The proposed data augmentation strategy substantially improved the robustness of the ADD systems across all tested conditions, maintaining performance even with significant codec compression and packet loss.

Approach

The authors address the challenge of ADD robustness in real-world scenarios by creating a new benchmark dataset (ADD-C) simulating various audio codecs and packet loss rates. They propose a data augmentation strategy to improve model performance on this dataset by augmenting the training data to include the simulated communication effects.

Datasets

ADD-C (created by the authors based on Fake-or-Real, Wavefake, LJSpeech, MLAAD, M-AILABS, and ASVspoof2021 Logical Access datasets)

Model(s)

GMM, LCNN, AASIST (baseline models); three novel models proposed by the authors using LFCC, CQCC, and raw waveform features.

Author countries

UK, UK

← Previous