End-to-end anti-spoofing with RawNet2

Authors: Hemlata Tak, Jose Patino, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans, Anthony Larcher

Published: 2020-11-02 16:40:52+00:00

Comment: Accepted to ICASSP 2021

AI Summary

This paper presents the first application of RawNet2, a deep neural network that ingests raw audio, for anti-spoofing in automatic speaker verification. It describes specific modifications to the RawNet2 architecture to adapt it for spoofing detection. The proposed system shows strong performance, particularly for the challenging A17 voice conversion attack, and achieves second-best results when fused with baseline countermeasures for the full ASVspoof 2019 logical access condition.

Abstract

Spoofing countermeasures aim to protect automatic speaker verification systems from attempts to manipulate their reliability with the use of spoofed speech signals. While results from the most recent ASVspoof 2019 evaluation show great potential to detect most forms of attack, some continue to evade detection. This paper reports the first application of RawNet2 to anti-spoofing. RawNet2 ingests raw audio and has potential to learn cues that are not detectable using more traditional countermeasure solutions. We describe modifications made to the original RawNet2 architecture so that it can be applied to anti-spoofing. For A17 attacks, our RawNet2 systems results are the second-best reported, while the fusion of RawNet2 and baseline countermeasures gives the second-best results reported for the full ASVspoof 2019 logical access condition. Our results are reproducible with open source software.


Key findings
RawNet2 achieves the second-best reported results for the challenging A17 voice conversion attack, significantly outperforming the baseline. While individual RawNet2 performance is inferior to the baseline for pooled results, fusing RawNet2 with baseline LFCC countermeasures yields the second-best overall results reported for the ASVspoof 2019 logical access condition, suggesting RawNet2 learns complementary cues.
Approach
The authors apply and modify RawNet2, an end-to-end convolutional neural network operating on raw audio waveforms, for anti-spoofing. Key modifications include fixing sinc filter parameters to avoid overfitting with sparse training data, optimizing filter length for spoofing cues, and using a larger number of kernel filters in the second residual block, followed by a GRU layer and a fully connected layer with softmax for two-class classification (bona-fide or spoof).
Datasets
ASVspoof 2019 logical access (LA) database
Model(s)
RawNet2 (modified), high-spectral-resolution LFCC-GMM baseline
Author countries
France