TADA: Training-free Attribution and Out-of-Domain Detection of Audio Deepfakes
Authors: Adriana Stan, David Combei, Dan Oneata, Horia Cucu
Published: 2025-06-06 07:00:23+00:00
AI Summary
This paper introduces TADA, a training-free method for audio deepfake source attribution and out-of-domain detection. It leverages a pre-trained self-supervised learning model and k-Nearest Neighbors (kNN) to achieve high F1-scores for both in-domain (0.93) and out-of-domain (0.84) detection across multiple datasets.
Abstract
Deepfake detection has gained significant attention across audio, text, and image modalities, with high accuracy in distinguishing real from fake. However, identifying the exact source--such as the system or model behind a deepfake--remains a less studied problem. In this paper, we take a significant step forward in audio deepfake model attribution or source tracing by proposing a training-free, green AI approach based entirely on k-Nearest Neighbors (kNN). Leveraging a pre-trained self-supervised learning (SSL) model, we show that grouping samples from the same generator is straightforward--we obtain an 0.93 F1-score across five deepfake datasets. The method also demonstrates strong out-of-domain (OOD) detection, effectively identifying samples from unseen models at an F1-score of 0.84. We further analyse these results in a multi-dimensional approach and provide additional insights. All code and data protocols used in this work are available in our open repository: https://github.com/adrianastan/tada/.