SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis

Authors: Vojtěch Staněk, Karel Srna, Anton Firc, Kamil Malinka

Published: 2025-08-11 12:58:37+00:00

AI Summary

The paper introduces the Speaker Characteristics DeepFake (SCDF) dataset, a large-scale, richly annotated resource for evaluating demographic biases in deepfake speech detection. Using SCDF, the authors demonstrate that state-of-the-art detectors exhibit significant performance disparities across various speaker demographics, highlighting the need for bias-aware development.

Abstract

Despite growing attention to deepfake speech detection, the aspects of bias and fairness remain underexplored in the speech domain. To address this gap, we introduce the Speaker Characteristics Deepfake (SCDF) dataset: a novel, richly annotated resource enabling systematic evaluation of demographic biases in deepfake speech detection. SCDF contains over 237,000 utterances in a balanced representation of both male and female speakers spanning five languages and a wide age range. We evaluate several state-of-the-art detectors and show that speaker characteristics significantly influence detection performance, revealing disparities across sex, language, age, and synthesizer type. These findings highlight the need for bias-aware development and provide a foundation for building non-discriminatory deepfake detection systems aligned with ethical and regulatory standards.


Key findings
State-of-the-art deepfake speech detectors show significant performance disparities across speaker sex, language, age, and synthesizer type. These findings reveal biases in existing models, particularly affecting older speakers and certain languages. The results emphasize the urgent need for bias-aware development of deepfake detection systems.
Approach
The authors created the SCDF dataset with balanced representation of male and female speakers across five languages and a wide age range, using four state-of-the-art synthesizers. They evaluated several existing deepfake detection models on this dataset to analyze their performance across different demographic subgroups and identify biases.
Datasets
SCDF (Speaker Characteristics DeepFake) dataset, VoxPopuli, Parczech
Model(s)
AASIST, MHFA, SLS (classifiers used with XLSR-300M for feature extraction)
Author countries
Czech Republic