Generalization of Spoofing Countermeasures: a Case Study with ASVspoof 2015 and BTAS 2016 Corpora

Authors: Dipjyoti Paul, Md Sahidullah, Goutam Saha

Published: 2019-01-23 17:55:39+00:00

Journal Ref: Published in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), New Orleans, LA, USA

AI Summary

This paper investigates the generalization capability of spoofing countermeasures under restricted training conditions, specifically when certain attack types are excluded from the training data. It analyzes the performance using MFCCs and CQCCs features with a GMM-ML classifier on ASVspoof 2015 and BTAS 2016 corpora, including cross-corpora analysis. The study reveals varying generalization capabilities across different spoofing types and highlights the importance of both static and dynamic spectral feature coefficients for real-life detection.

Abstract

Voice-based biometric systems are highly prone to spoofing attacks. Recently, various countermeasures have been developed for detecting different kinds of attacks such as replay, speech synthesis (SS) and voice conversion (VC). Most of the existing studies are conducted with a specific training set defined by the evaluation protocol. However, for realistic scenarios, selecting appropriate training data is an open challenge for the system administrator. Motivated by this practical concern, this work investigates the generalization capability of spoofing countermeasures in restricted training conditions where speech from a broad attack types are left out in the training database. We demonstrate that different spoofing types have considerably different generalization capabilities. For this study, we analyze the performance using two kinds of features, mel-frequency cepstral coefficients (MFCCs) which are considered as baseline and recently proposed constant Q cepstral coefficients (CQCCs). The experiments are conducted with standard Gaussian mixture model - maximum likelihood (GMM-ML) classifier on two recently released spoofing corpora: ASVspoof 2015 and BTAS 2016 that includes cross-corpora performance analysis. Feature-level analysis suggests that static and dynamic coefficients of spectral features, both are important for detecting spoofing attacks in the real-life condition.


Key findings
The study found that different spoofing attack types have considerably different generalization capabilities, with direct replay data showing better generalization than SS and VC based replayed data. CQCC features consistently yielded superior results, particularly in detecting unknown attacks and across various generalization scenarios, compared to MFCCs. Both static and dynamic parts of spectral features were found to be important for robust spoofing detection in a generalized sense.
Approach
The approach involves training spoofing countermeasures with a Gaussian Mixture Model - Maximum Likelihood (GMM-ML) classifier using either Mel-Frequency Cepstral Coefficients (MFCCs) or Constant Q Cepstral Coefficients (CQCCs) as features. The core investigation assesses generalization by systematically excluding specific spoofing attack types from the training data and then evaluating performance on both known and previously unseen attack types within and across datasets.
Datasets
ASVspoof 2015, BTAS 2016
Model(s)
Gaussian Mixture Model - Maximum Likelihood (GMM-ML) classifier; Mel-frequency cepstral coefficients (MFCCs), Constant Q cepstral coefficients (CQCCs) for feature extraction.
Author countries
India, Finland