Generalizable Detection of Audio Deepfakes

View on arXiv ← Back to list

Authors: Jose A. Lopez, Georg Stemmer, Héctor Cordourier Maruri

Published: 2025-07-02 14:28:11+00:00

AI Summary

This paper presents a comprehensive study to improve the generalization of audio deepfake detection models. It explores various pre-trained backbones, data augmentation techniques, and loss functions, achieving performance surpassing the top single system in the ASVspoof 5 Challenge.

Abstract

In this paper, we present our comprehensive study aimed at enhancing the generalization capabilities of audio deepfake detection models. We investigate the performance of various pre-trained backbones, including Wav2Vec2, WavLM, and Whisper, across a diverse set of datasets, including those from the ASVspoof challenges and additional sources. Our experiments focus on the effects of different data augmentation strategies and loss functions on model performance. The results of our research demonstrate substantial enhancements in the generalization capabilities of audio deepfake detection models, surpassing the performance of the top-ranked single system in the ASVspoof 5 Challenge. This study contributes valuable insights into the optimization of audio models for more robust deepfake detection and facilitates future research in this critical area.

Key findings

The study demonstrates significant improvements in generalization capabilities of audio deepfake detection models, exceeding the performance of the best single system in the ASVspoof 5 Challenge. The use of focal loss and hinged-center loss, along with specific data augmentation techniques, proved crucial for achieving these results. The model showed robustness across various datasets and lacked bias based on speaker attributes.

Approach

The authors enhance audio deepfake detection by experimenting with different pre-trained backbones (Wav2Vec2, WavLM, Whisper), data augmentation strategies (AWGN, RawBoost, vocoding, RIR), and loss functions (focal loss, hinged-center loss). They combine these techniques to create a model that generalizes well across diverse datasets.

Datasets

ASVspoof 2015, 2019 LA, 2021 LA, 2021 DF, ASVspoof 5, In-The-Wild (ITW), M-AILABS, MLAAD v4, DeepFake Detection Challenge (DFDC), FakeAVCeleb, Speecon US.

Model(s)

Wav2Vec2, WavLM, Whisper, XLS-R 300M, XLS-R 1B

Author countries

USA, Germany, Mexico

← Previous