A Comparative Study of Fusion Methods for SASV Challenge 2022

Authors: Petr Grinberg, Vladislav Shikhov

Published: 2022-03-31 11:43:01+00:00

AI Summary

This paper investigates various fusion methods for combining embeddings from Automatic Speaker Verification (ASV) and countermeasure (CM) systems in the Spoofing Aware Speaker Verification (SASV) Challenge 2022. The authors explore different fusion techniques, including boosting over embeddings (CatBoost), which outperforms existing methods.

Abstract

Automatic Speaker Verification (ASV) system is a type of bio-metric authentication. It can be attacked by an intruder, who falsifies data in order to get access to protected information. Countermeasures (CM) are special algorithms that detect these spoofing-attacks. While the ASVspoof Challenge series were focused on the development of CM for fixed ASV system, the new Spoofing Aware Speaker Verification (SASV) Challenge organizers believe that best results can be achieved if CM and ASV systems are optimized jointly. One of the approaches for cooperative optimization is a fusion over embeddings or scores obtained from ASV and CM models. The baselines of SASV Challenge 2022 present two types of fusion: score-sum and back-end ensemble with a 3-layer MLP. This paper describes our research of other fusion methods, including boosting over embeddings, which has not been used in anti-spoofing studies before.


Key findings
CatBoost, a boosting method applied to the combined embeddings, significantly outperforms baseline methods and other tested classifiers in the SASV 2022 challenge. Other fusion methods show varying performance, with some non-linear methods performing better than linear ones for embedding fusion, but score fusion shows better results with logistic regression and SVM.
Approach
The researchers concatenate embeddings from multiple ASV and CM models (including RawNet2 and LightCNN with various spectrograms) into a single feature vector. This vector is then fed into various classifiers (MLP, Logistic Regression, SVM, CatBoost, etc.) to detect spoofing attacks.
Datasets
VoxCeleb2, ASVspoof 2019 LA train partition
Model(s)
ECAPA-TDNN (ASV), AASIST, RawNet2, LightCNN (CM), 3-layer MLP, Logistic Regression, SVM (linear, RBF, polynomial kernels), Random Fourier Features (RFF) with logistic regression, Gaussian Mixture Model (GMM), Random Forest, CatBoost
Author countries
Russia