Source Verification for Speech Deepfakes

View on arXiv ← Back to list

Authors: Viola Negroni, Davide Salvi, Paolo Bestagini, Stefano Tubaro

Published: 2025-05-20 10:42:48+00:00

AI Summary

This paper introduces the novel task of source verification for speech deepfakes, focusing on determining if a test audio track originates from the same generative model as a set of reference tracks. The approach leverages embeddings from a classifier trained for source attribution, comparing embeddings using cosine similarity to assess source identity.

Abstract

With the proliferation of speech deepfake generators, it becomes crucial not only to assess the authenticity of synthetic audio but also to trace its origin. While source attribution models attempt to address this challenge, they often struggle in open-set conditions against unseen generators. In this paper, we introduce the source verification task, which, inspired by speaker verification, determines whether a test track was produced using the same model as a set of reference signals. Our approach leverages embeddings from a classifier trained for source attribution, computing distance scores between tracks to assess whether they originate from the same source. We evaluate multiple models across diverse scenarios, analyzing the impact of speaker diversity, language mismatch, and post-processing operations. This work provides the first exploration of source verification, highlighting its potential and vulnerabilities, and offers insights for real-world forensic applications.

Key findings

ResNet18 achieved the best overall performance in source verification. The study highlighted the impact of speaker diversity and language mismatch on performance, showing that models trained on multi-speaker data generalize better to multi-speaker test sets. Post-processing operations, particularly speech enhancement, significantly degraded performance.

Approach

The proposed method trains a classifier for source attribution and uses it as a feature extractor. It then computes cosine similarity scores between embeddings of a test track and a reference set of tracks from a known generator to verify their common origin.

Datasets

MLAAD (DMLA), ASVspoof 2019 (DASV), TIMIT-TTS (DTIM), ADD 2023 (DADD)

Model(s)

ResNet18, LCNN, RawNet2, AASIST

Author countries

Italy

← Previous