STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution

View on arXiv ← Back to list

Authors: Anton Firc, Manasi Chibber, Jagabandhu Mishra, Vishwanath Pratap Singh, Tomi Kinnunen, Kamil Malinka

Published: 2025-05-26 08:00:30+00:00

AI Summary

The paper introduces STOPA, a new dataset for deepfake audio source tracing. STOPA offers systematic variation in acoustic and vocoder models across 700k samples, enabling more reliable attribution of synthesized speech compared to existing datasets with limited variation.

Abstract

A key research area in deepfake speech detection is source tracing - determining the origin of synthesised utterances. The approaches may involve identifying the acoustic model (AM), vocoder model (VM), or other generation-specific parameters. However, progress is limited by the lack of a dedicated, systematically curated dataset. To address this, we introduce STOPA, a systematically varied and metadata-rich dataset for deepfake speech source tracing, covering 8 AMs, 6 VMs, and diverse parameter settings across 700k samples from 13 distinct synthesisers. Unlike existing datasets, which often feature limited variation or sparse metadata, STOPA provides a systematically controlled framework covering a broader range of generative factors, such as the choice of the vocoder model, acoustic model, or pretrained weights, ensuring higher attribution reliability. This control improves attribution accuracy, aiding forensic analysis, deepfake detection, and generative model transparency.

Key findings

Established models like AASIST and ResNet yielded high Equal Error Rates (EERs) exceeding 30% on the STOPA dataset's open-world source tracing task. This highlights the difficulty of zero-shot source tracing and the need for improved embedding extraction and comparison strategies. The limited number of attacks in STOPA's training set likely hampered performance.

Approach

The authors address source tracing as an open-world detection task. They propose an evaluation protocol that allows integrating new attack signatures without retraining. They then use this protocol to evaluate several models on the new dataset.

Datasets

STOPA dataset (created by the authors), ASVspoof datasets (ASVspoof19 LA, ASVspoof21 DF, ASVspoof5), VCTK dataset, MLAAD dataset, TIMIT-TTS dataset, SemaFor dataset.

Model(s)

ResNet-34, AASIST CM (trained on ASVspoof2019 LA), AASIST STOPA (trained on STOPA training partition)

Author countries

Czech Republic, Finland

← Previous