A Study On Convolutional Neural Network Based End-To-End Replay Anti-Spoofing

View on arXiv ← Back to list

Authors: Bhusan Chettri, Saumitra Mishra, Bob L. Sturm, Emmanouil Benetos

Published: 2018-05-22 14:53:13+00:00

AI Summary

This paper investigates the performance of Convolutional Neural Networks (CNNs) for end-to-end replay attack detection in the ASVspoof 2017 challenge. The authors find that while CNNs generalize well on the development dataset, they struggle to generalize to the evaluation dataset, highlighting challenges in achieving consistent performance across different data distributions.

Abstract

The second Automatic Speaker Verification Spoofing and Countermeasures challenge (ASVspoof 2017) focused on replay attack detection. The best deep-learning systems to compete in ASVspoof 2017 used Convolutional Neural Networks (CNNs) as a feature extractor. In this paper, we study their performance in an end-to-end setting. We find that these architectures show poor generalization in the evaluation dataset, but find a compact architecture that shows good generalization on the development data. We demonstrate that for this dataset it is not easy to obtain a similar level of generalization on both the development and evaluation data. This leads to a variety of open questions about what the differences are in the data; why these are more evident in an end-to-end setting; and how these issues can be overcome by increasing the training data.

Key findings

The study reveals difficulties in achieving consistent generalization performance across the development and evaluation datasets of ASVspoof 2017 using CNNs. The best performing CNN architecture still shows a significant gap in Equal Error Rate (EER) between the two datasets. The authors propose a lightweight CNN architecture but it still fails to generalize well to the evaluation set.

Approach

The researchers explored multiple CNN architectures for end-to-end audio replay attack detection using the ASVspoof 2017 dataset. They experimented with different network configurations, including replicating a state-of-the-art system, and investigated the impact of hyperparameters like dropout rate and batch size on model generalization.

Datasets

ASVspoof 2017 database version 1

Model(s)

Various Convolutional Neural Networks (CNNs), including replications of architectures from the ASVspoof 2017 challenge and a novel lightweight architecture proposed by the authors. Gaussian Mixture Models (GMMs) were also used in some experiments.

Author countries

United Kingdom

← Previous