Spoofing-Aware Speaker Verification with Unsupervised Domain Adaptation

Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Published: 2022-03-21 14:02:06+00:00

AI Summary

This paper proposes a method for improving the spoofing robustness of automatic speaker verification (ASV) systems without using a separate countermeasure module. It achieves this by employing unsupervised domain adaptation techniques to optimize the back-end probabilistic linear discriminant analysis (PLDA) classifier using the ASVspoof 2019 dataset, resulting in significant performance improvements.

Abstract

In this paper, we initiate the concern of enhancing the spoofing robustness of the automatic speaker verification (ASV) system, without the primary presence of a separate countermeasure module. We start from the standard ASV framework of the ASVspoof 2019 baseline and approach the problem from the back-end classifier based on probabilistic linear discriminant analysis. We employ three unsupervised domain adaptation techniques to optimize the back-end using the audio data in the training partition of the ASVspoof 2019 dataset. We demonstrate notable improvements on both logical and physical access scenarios, especially on the latter where the system is attacked by replayed audios, with a maximum of 36.1% and 5.3% relative improvement on bonafide and spoofed cases, respectively. We perform additional studies such as per-attack breakdown analysis, data composition, and integration with a countermeasure system at score-level with Gaussian back-end.


Key findings
The proposed method demonstrates notable improvements in both logical and physical access scenarios, particularly for replayed audios. Maximum relative improvements of 36.1% and 5.3% were observed for bonafide and spoofed cases, respectively, in the physical access scenario. The results suggest the potential of unsupervised domain adaptation for enhancing ASV's resistance to spoofing attacks, especially replay attacks.
Approach
The authors enhance the spoofing robustness of an ASV system by applying unsupervised domain adaptation to the back-end PLDA classifier. Three unsupervised domain adaptation techniques (CORAL, CORAL+, and Kaldi adaptation) are used to optimize the PLDA using audio data from the ASVspoof 2019 dataset's training partition. This approach avoids the need for a separate countermeasure module.
Datasets
ASVspoof 2019, VoxCeleb1, VoxCeleb2
Model(s)
Probabilistic Linear Discriminant Analysis (PLDA), x-vector (based on E-TDNN with attentive statistics pooling and additive angular softmax loss)
Author countries
Finland, France