Post-training for Deepfake Speech Detection
Authors: Wanying Ge, Xin Wang, Xuechen Liu, Junichi Yamagishi
Published: 2025-06-26 08:34:19+00:00
Comment: Corrected previous implementation of EER calculation. Slight numerical changes in some of the results
AI Summary
This paper introduces a post-training approach for deepfake speech detection, adapting self-supervised learning (SSL) models to bridge the gap between general pre-training and domain-specific fine-tuning. Named AntiDeepfake models, they are developed using a large-scale multilingual speech dataset comprising over 56,000 hours of genuine speech and 18,000 hours of speech with various artifacts. These models achieve strong robustness and generalization to unseen deepfake speech, consistently surpassing existing state-of-the-art detectors when further fine-tuned.
Abstract
We introduce a post-training approach that adapts self-supervised learning (SSL) models for deepfake speech detection by bridging the gap between general pre-training and domain-specific fine-tuning. We present AntiDeepfake models, a series of post-trained models developed using a large-scale multilingual speech dataset containing over 56,000 hours of genuine speech and 18,000 hours of speech with various artifacts in over one hundred languages. Experimental results show that the post-trained models already exhibit strong robustness and generalization to unseen deepfake speech. When they are further fine-tuned on the Deepfake-Eval-2024 dataset, these models consistently surpass existing state-of-the-art detectors that do not leverage post-training. Model checkpoints and source code are available online.