Targeted Augmented Data for Audio Deepfake Detection

Authors: Marcella Astrid, Enjie Ghorbel, Djamila Aouada

Published: 2024-07-10 12:31:53+00:00

AI Summary

This paper proposes a novel data augmentation method for improving the generalization capabilities of audio deepfake detectors. By perturbing real audio data to create pseudo-fakes near the model's decision boundary, the method enhances the diversity of training data and mitigates overfitting to specific manipulation techniques.

Abstract

The availability of highly convincing audio deepfake generators highlights the need for designing robust audio deepfake detectors. Existing works often rely solely on real and fake data available in the training set, which may lead to overfitting, thereby reducing the robustness to unseen manipulations. To enhance the generalization capabilities of audio deepfake detectors, we propose a novel augmentation method for generating audio pseudo-fakes targeting the decision boundary of the model. Inspired by adversarial attacks, we perturb original real data to synthesize pseudo-fakes with ambiguous prediction probabilities. Comprehensive experiments on two well-known architectures demonstrate that the proposed augmentation contributes to improving the generalization capabilities of these architectures.


Key findings
The proposed augmentation method significantly improved the performance of both AASIST and RawNet2 models on the ASVspoof 2019 dataset, as measured by min t-DCF and EER. The results demonstrate the effectiveness of targeting ambiguous pseudo-fakes for enhancing audio deepfake detection, achieving competitive results compared to state-of-the-art methods.
Approach
The authors propose augmenting the training data by generating pseudo-fake audio samples. These samples are created by perturbing real audio data using an adversarial attack-like method, targeting a region of ambiguous classification probabilities near the decision boundary between real and fake audio. This approach aims to improve the model's robustness to unseen deepfakes.
Datasets
ASVspoof 2019 logical access (LA) dataset
Model(s)
AASIST and RawNet2
Author countries
Luxembourg, Tunisia