BUT Systems and Analyses for the ASVspoof 5 Challenge

View on arXiv ← Back to list

Authors: Johan Rohdin, Lin Zhang, Oldřich Plchot, Vojtěch Staněk, David Mihola, Junyi Peng, Themos Stafylakis, Dmitriy Beveraki, Anna Silnova, Jan Brukner, Lukáš Burget

Published: 2024-08-20 19:18:20+00:00

AI Summary

This paper presents the Brno University of Technology's (BUT) systems for the ASVspoof 2025 challenge, focusing on deepfake detection and spoofing-robust automatic speaker verification (SASV). The main contributions include analyzing different label schemes for deepfake detection and proposing a logistic regression approach for jointly optimizing affine transformations of countermeasure and speaker verification scores in SASV.

Abstract

This paper describes the BUT submitted systems for the ASVspoof 5 challenge, along with analyses. For the conventional deepfake detection task, we use ResNet18 and self-supervised models for the closed and open conditions, respectively. In addition, we analyze and visualize different combinations of speaker information and spoofing information as label schemes for training. For spoofing-robust automatic speaker verification (SASV), we introduce effective priors and propose using logistic regression to jointly train affine transformations of the countermeasure scores and the automatic speaker verification scores in such a way that the SASV LLR is optimized.

Key findings

In deepfake detection, integrating all spoofed samples as a single class improved performance. For SASV, the proposed logistic regression calibration improved min a-DCF by approximately 1%. The best performing systems combined ResNet18 and self-supervised models for deepfake detection, and a combination of ResNet and self-supervised models for SASV.

Approach

For deepfake detection, they used ResNet18 and self-supervised models for closed and open conditions respectively, analyzing various label schemes combining speaker and spoofing information. For SASV, they introduced effective priors and used logistic regression to jointly train affine transformations of countermeasure and speaker verification scores, optimizing the SASV log-likelihood ratio.

Datasets

Multilingual Librispeech (MLS) dataset (English subset), synthetic data from community volunteers, MUSAN (noise subset), VoxCeleb2

Model(s)

ResNet18, self-supervised models (Wav2vec2, WavLM, Hubert, data2vec), ResNet34, ResNet101, ResNet221

Author countries

Czech Republic, United Kingdom

← Previous