Open-Set Source Tracing of Audio Deepfake Systems

View on arXiv ← Back to list

Authors: Nicholas Klein, Hemlata Tak, Elie Khoury

Published: 2025-07-09 01:03:36+00:00

AI Summary

This paper addresses the challenge of open-set source tracing in audio deepfakes. It introduces a novel softmax energy (SME) score for out-of-distribution (OOD) detection, significantly improving open-set source tracing performance compared to existing energy-based methods. The authors achieve an FPR95 of 8.3% by combining SME with data augmentation techniques.

Abstract

Existing research on source tracing of audio deepfake systems has focused primarily on the closed-set scenario, while studies that evaluate open-set performance are limited to a small number of unseen systems. Due to the large number of emerging audio deepfake systems, robust open-set source tracing is critical. We leverage the protocol of the Interspeech 2025 special session on source tracing to evaluate methods for improving open-set source tracing performance. We introduce a novel adaptation to the energy score for out-of-distribution (OOD) detection, softmax energy (SME). We find that replacing the typical temperature-scaled energy score with SME provides a relative average improvement of 31% in the standard FPR95 (false positive rate at true positive rate of 95%) measure. We further explore SME-guided training as well as copy synthesis, codec, and reverberation augmentations, yielding an FPR95 of 8.3%.

Key findings

Replacing the energy score with SME resulted in a 31% relative average improvement in FPR95. Combining SME with SME-guided training and data augmentation achieved an FPR95 of 8.3%. The best model outperformed a reference system by a 52% absolute margin in unweighted EER.

Approach

The authors propose a novel softmax energy (SME) score for OOD detection, adapting the energy score by applying a softmax function to the logits before computing the energy. They also explore SME-guided training using auxiliary OOD data and data augmentation techniques (copy synthesis, codec, and reverberation) to improve open-set performance.

Datasets

MLAAD dataset, ASVspoof 5 dataset

Model(s)

ResNet34 with Large Margin Cosine Loss (LMCL)

Author countries

USA

← Previous