A Study On Convolutional Neural Network Based End-To-End Replay Anti-Spoofing

Authors: Bhusan Chettri, Saumitra Mishra, Bob L. Sturm, Emmanouil Benetos

Published: 2018-05-22 14:53:13+00:00

Comment: 6 pages

AI Summary

This paper studies the performance of Convolutional Neural Networks (CNNs) in an end-to-end setting for replay attack detection within the ASVspoof 2017 challenge. The authors find that existing CNN architectures exhibit poor generalization on the evaluation dataset compared to development data. They propose a compact CNN architecture and investigate factors affecting generalization, highlighting challenges related to data differences and limited training data.

Abstract

The second Automatic Speaker Verification Spoofing and Countermeasures challenge (ASVspoof 2017) focused on replay attack detection. The best deep-learning systems to compete in ASVspoof 2017 used Convolutional Neural Networks (CNNs) as a feature extractor. In this paper, we study their performance in an end-to-end setting. We find that these architectures show poor generalization in the evaluation dataset, but find a compact architecture that shows good generalization on the development data. We demonstrate that for this dataset it is not easy to obtain a similar level of generalization on both the development and evaluation data. This leads to a variety of open questions about what the differences are in the data; why these are more evident in an end-to-end setting; and how these issues can be overcome by increasing the training data.


Key findings
The study found that CNN-based end-to-end systems consistently showed poor generalization on the ASVspoof 2017 evaluation dataset, despite achieving good performance on the development set. This generalization gap was evident across replicated state-of-the-art architectures and their own proposed compact CNN. The findings raise questions about the underlying data differences between subsets and the challenges of training robust deep models with limited and potentially imbalanced data.
Approach
The researchers investigate several CNN architectures for end-to-end replay anti-spoofing, replicating top-performing systems from ASVspoof 2017 and proposing a novel compact CNN. They train these models using log power magnitude spectrograms and analyze their generalization performance across development and evaluation datasets. The study also explores the impact of different activation functions, batch sizes, and spectrogram representations.
Datasets
ASVspoof 2017 (version 1)
Model(s)
Convolutional Neural Networks (CNNs). Specifically, replicated architectures from [4] (LCNNF F T), [5] (Model 1), and [19] ('Bulbul', as Model 2), plus a novel proposed CNN architecture (Model 3).
Author countries
United Kingdom