Generalization of Spoofing Countermeasures: a Case Study with ASVspoof 2015 and BTAS 2016 Corpora

Authors: Dipjyoti Paul, Md Sahidullah, Goutam Saha

Published: 2019-01-23 17:55:39+00:00

AI Summary

This paper investigates the generalization capability of spoofing countermeasures in voice-based biometric systems. It analyzes the performance of different spoofing types using MFCCs and CQCCs features with a GMM-ML classifier on ASVspoof 2015 and BTAS 2016 corpora, showing varying generalization capabilities across spoofing types.

Abstract

Voice-based biometric systems are highly prone to spoofing attacks. Recently, various countermeasures have been developed for detecting different kinds of attacks such as replay, speech synthesis (SS) and voice conversion (VC). Most of the existing studies are conducted with a specific training set defined by the evaluation protocol. However, for realistic scenarios, selecting appropriate training data is an open challenge for the system administrator. Motivated by this practical concern, this work investigates the generalization capability of spoofing countermeasures in restricted training conditions where speech from a broad attack types are left out in the training database. We demonstrate that different spoofing types have considerably different generalization capabilities. For this study, we analyze the performance using two kinds of features, mel-frequency cepstral coefficients (MFCCs) which are considered as baseline and recently proposed constant Q cepstral coefficients (CQCCs). The experiments are conducted with standard Gaussian mixture model - maximum likelihood (GMM-ML) classifier on two recently released spoofing corpora: ASVspoof 2015 and BTAS 2016 that includes cross-corpora performance analysis. Feature-level analysis suggests that static and dynamic coefficients of spectral features, both are important for detecting spoofing attacks in the real-life condition.


Key findings
Different spoofing types exhibit significantly different generalization capabilities. Direct replay data show better generalization than replayed speech synthesis (SS) and voice conversion (VC) data. CQCC features generally outperform MFCCs, particularly in detecting unknown attacks. Cross-corpora performance is poor due to data mismatch.
Approach
The authors evaluated the generalization capability of spoofing detection systems by training models on subsets of spoofing data (excluding certain attack types) and testing on the full dataset. They compared the performance using Mel-Frequency Cepstral Coefficients (MFCCs) and Constant Q Cepstral Coefficients (CQCCs) features and a Gaussian Mixture Model - Maximum Likelihood (GMM-ML) classifier.
Datasets
ASVspoof 2015 and BTAS 2016 corpora
Model(s)
Gaussian Mixture Model - Maximum Likelihood (GMM-ML)
Author countries
India, Finland