Trusted Fake Audio Detection Based on Dirichlet Distribution

Authors: Chi Ding, Junxiao Xue, Cong Wang, Hao Zhou

Published: 2025-06-03 03:40:39+00:00

AI Summary

This paper introduces a novel fake audio detection approach that enhances reliability by modeling the trustworthiness of model decisions using the Dirichlet distribution. The approach generates evidence via a neural network, models uncertainty with the Dirichlet distribution, and combines predicted probabilities with uncertainty estimates for final classification.

Abstract

With the continuous development of deep learning-based speech conversion and speech synthesis technologies, the cybersecurity problem posed by fake audio has become increasingly serious. Previously proposed models for defending against fake audio have attained remarkable performance. However, they all fall short in modeling the trustworthiness of the decisions made by the models themselves. Based on this, we put forward a plausible fake audio detection approach based on the Dirichlet distribution with the aim of enhancing the reliability of fake audio detection. Specifically, we first generate evidence through a neural network. Uncertainty is then modeled using the Dirichlet distribution. By modeling the belief distribution with the parameters of the Dirichlet distribution, an estimate of uncertainty can be obtained for each decision. Finally, the predicted probabilities and corresponding uncertainty estimates are combined to form the final opinion. On the ASVspoof series dataset (i.e., ASVspoof 2019 LA, ASVspoof 2021 LA, and DF), we conduct a number of comparison experiments to verify the excellent performance of the proposed model in terms of accuracy, robustness, and trustworthiness.


Key findings
The proposed trusted models significantly outperform existing methods in terms of EER, min t-DCF, aECE, and PCC on the ASVspoof datasets. The models demonstrate improved accuracy and robustness, particularly by reducing uncertainty and improving calibration. The accuracy of the model decreases as uncertainty increases.
Approach
The authors propose a three-step method: evidence generation using a neural network (modifying existing models like AASIST, RawNet2, and RawGAT-ST by replacing the softmax layer with a softplus layer), uncertainty modeling with the Dirichlet distribution, and opinion generation by combining predicted probabilities and uncertainty estimates. This improves the reliability and trustworthiness of the detection.
Datasets
ASVspoof 2019 LA, ASVspoof 2021 LA, ASVspoof 2021 DF
Model(s)
AASIST, RawNet2, RawGAT-ST (modified with a softplus layer to generate positive evidence for the Dirichlet distribution)
Author countries
China