End-to-end anti-spoofing with RawNet2

Authors: Hemlata Tak, Jose Patino, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans, Anthony Larcher

Published: 2020-11-02 16:40:52+00:00

AI Summary

This paper presents the first application of RawNet2, a raw audio-based deep neural network, to anti-spoofing in automatic speaker verification. Modifications were made to the original RawNet2 architecture to improve its performance in detecting spoofed speech, particularly the challenging A17 attack. The results show that while overall performance is not superior to a baseline, the system achieves state-of-the-art results on the A17 attack and improves when fused with the baseline.

Abstract

Spoofing countermeasures aim to protect automatic speaker verification systems from attempts to manipulate their reliability with the use of spoofed speech signals. While results from the most recent ASVspoof 2019 evaluation show great potential to detect most forms of attack, some continue to evade detection. This paper reports the first application of RawNet2 to anti-spoofing. RawNet2 ingests raw audio and has potential to learn cues that are not detectable using more traditional countermeasure solutions. We describe modifications made to the original RawNet2 architecture so that it can be applied to anti-spoofing. For A17 attacks, our RawNet2 systems results are the second-best reported, while the fusion of RawNet2 and baseline countermeasures gives the second-best results reported for the full ASVspoof 2019 logical access condition. Our results are reproducible with open source software.


Key findings
While the modified RawNet2 system's overall performance was slightly inferior to a baseline system, it achieved second-best results for the A17 attack. Fusion of RawNet2 and the baseline resulted in the second-best overall performance reported for the ASVspoof 2019 logical access condition. The RawNet2 system appears to learn different cues than the baseline, suggesting complementarity.
Approach
The authors adapted the RawNet2 architecture for anti-spoofing by modifying the first layer (using fixed sinc filters), adjusting filter lengths, and incorporating a fully connected layer before the output layer. They trained the model using the ASVspoof 2019 LA database and evaluated performance using the tandem detection cost function (t-DCF) metric.
Datasets
ASVspoof 2019 logical access (LA) database
Model(s)
Modified RawNet2 architecture
Author countries
France