XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge

View on arXiv ← Back to list

Authors: Qishan Zhang, Shuangbing Wen, Fangke Yan, Tao Hu, Jun Li

Published: 2024-09-27 08:55:51+00:00

AI Summary

This paper presents XWSB, a state-of-the-art system for singing voice deepfake detection, achieving an EER of 2.32% in the CtrSVDD 2024 challenge. XWSB blends XLS-R and WavLM models, each coupled with an SLS classifier, using a max voting strategy for final decision.

Abstract

This paper introduces the model structure used in the SVDD 2024 Challenge. The SVDD 2024 challenge has been introduced this year for the first time. Singing voice deepfake detection (SVDD) which faces complexities due to informal speech intonations and varying speech rates. In this paper, we propose the XWSB system, which achieved SOTA per-formance in the SVDD challenge. XWSB stands for XLS-R, WavLM, and SLS Blend, representing the integration of these technologies for the purpose of SVDD. Specifically, we used the best performing model structure XLS-R&SLS from the ASVspoof DF dataset, and applied SLS to WavLM to form the WavLM&SLS structure. Finally, we integrated two models to form the XWSB system. Experimental results show that our system demonstrates advanced recognition capabilities in the SVDD challenge, specifically achieving an EER of 2.32% in the CtrSVDD track. The code and data can be found at https://github.com/QiShanZhang/XWSB_for_ SVDD2024.

Key findings

XWSB achieved state-of-the-art performance in the CtrSVDD 2024 challenge with an EER of 2.32%. The SLS classifier proved effective when combined with both XLS-R and WavLM. The ensemble method improved performance over individual models, except when dealing with the Acesinger dataset.

Approach

The XWSB system integrates two models: XLS-R&SLS and WavLM&SLS. Each model uses a pre-trained model (XLS-R or WavLM) to extract features, which are then processed by an SLS classifier. A max voting scheme combines the outputs of the two models for a final prediction.

Datasets

CtrSVDD 2024 challenge dataset

Model(s)

XLS-R, WavLM, SLS classifier

Author countries

China

← Previous