Dataset artefacts in anti-spoofing systems: a case study on the ASVspoof 2017 benchmark

Authors: Bhusan Chettri, Emmanouil Benetos, Bob L. T. Sturm

Published: 2020-10-15 17:46:49+00:00

AI Summary

This research paper investigates how artifacts in the ASVspoof 2017 dataset contribute to the apparent success of published spoofing detection systems. The authors demonstrate how these artifacts can be exploited to manipulate model decisions and propose a framework incorporating speech endpoint detection to improve model robustness and trustworthiness.

Abstract

The Automatic Speaker Verification Spoofing and Countermeasures Challenges motivate research in protecting speech biometric systems against a variety of different access attacks. The 2017 edition focused on replay spoofing attacks, and involved participants building and training systems on a provided dataset (ASVspoof 2017). More than 60 research papers have so far been published with this dataset, but none have sought to answer why countermeasures appear successful in detecting spoofing attacks. This article shows how artefacts inherent to the dataset may be contributing to the apparent success of published systems. We first inspect the ASVspoof 2017 dataset and summarize various artefacts present in the dataset. Second, we demonstrate how countermeasure models can exploit these artefacts to appear successful in this dataset. Third, for reliable and robust performance estimates on this dataset we propose discarding nonspeech segments and silence before and after the speech utterance during training and inference. We create speech start and endpoint annotations in the dataset and demonstrate how using them helps countermeasure models become less vulnerable from being manipulated using artefacts found in the dataset. Finally, we provide several new benchmark results for both frame-level and utterance-level models that can serve as new baselines on this dataset.


Key findings
Artifacts in the ASVspoof 2017 dataset significantly influence model performance, leading to overestimation of accuracy. Manipulating test audio using these artifacts easily fooled the models. Incorporating speech endpoint detection during training and inference significantly improved model robustness and trustworthiness.
Approach
The authors analyzed the ASVspoof 2017 dataset, identifying artifacts like burst click sounds and silence patterns. They then conducted intervention experiments, manipulating test audio to exploit these artifacts and assess their impact on various countermeasure models. Finally, they proposed a framework using speech endpoint detection to create more robust models.
Datasets
ASVspoof 2017 v2.0
Model(s)
GMMs, Cosine Distance, Support Vector Machines (SVMs), and two CNNs (CNN1 and CNN2), and a proposed DNN.
Author countries
UK, Sweden