Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy

View on arXiv ← Back to list

Authors: Yuankun Xie, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Xiaopeng Wang, Haonnan Cheng, Long Ye, Jianhua Tao

Published: 2024-06-05 13:16:55+00:00

AI Summary

This paper introduces the Real Emphasis and Fake Dispersion (REFD) strategy for audio deepfake algorithm recognition, focusing on both in-distribution (ID) and out-of-distribution (OOD) detection. REFD uses a two-stage approach, emphasizing real audio detection in the first stage and focusing on fake audio classification and OOD detection in the second, achieving a state-of-the-art 86.83% F1-score on Audio Deepfake Detection Challenge 2023 Track 3.

Abstract

With the proliferation of deepfake audio, there is an urgent need to investigate their attribution. Current source tracing methods can effectively distinguish in-distribution (ID) categories. However, the rapid evolution of deepfake algorithms poses a critical challenge in the accurate identification of out-of-distribution (OOD) novel deepfake algorithms. In this paper, we propose Real Emphasis and Fake Dispersion (REFD) strategy for audio deepfake algorithm recognition, demonstrating its effectiveness in discriminating ID samples while identifying OOD samples. For effective OOD detection, we first explore current post-hoc OOD methods and propose NSD, a novel OOD approach in identifying novel deepfake algorithms through the similarity consideration of both feature and logits scores. REFD achieves 86.83% F1-score as a single system in Audio Deepfake Detection Challenge 2023 Track3, showcasing its state-of-the-art performance.

Key findings

REFD achieved an 86.83% F1-score in the ADD2023T3 challenge, outperforming other single-system approaches. The proposed NSD OOD detection method significantly improved OOD detection accuracy. The dual-stage approach proved superior to single-stage methods in both real and OOD sample detection.

Approach

REFD employs a two-stage training approach. The first stage emphasizes real audio detection using OC-Softmax. The second stage uses RegMixup to address overconfidence in classification logits and a novel OOD detection method, NSD, which considers both feature and logit similarity to identify novel deepfake algorithms.

Datasets

Audio Deepfake Detection Challenge 2023 Track 3 (ADD2023T3)

Model(s)

Wav2Vec2-AASIST (Wav2Vec2 features are extracted and fed into AASIST)

Author countries

China

← Previous