ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

Authors: Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

Published: 2019-04-01 21:47:00+00:00

AI Summary

The paper introduces ASSERT, a system for audio spoofing detection submitted to the ASVspoof 2019 Challenge. It uses variants of squeeze-excitation and residual networks, achieving significant performance improvements over baseline systems in both Physical Access and Logical Access sub-challenges.

Abstract

We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT). Anti-spoofing has gathered more and more attention since the inauguration of the ASVspoof Challenges, and ASVspoof 2019 dedicates to address attacks from all three major types: text-to-speech, voice conversion, and replay. Built upon previous research work on Deep Neural Network (DNN), ASSERT is a pipeline for DNN-based approach to anti-spoofing. ASSERT has four components: feature engineering, DNN models, network optimization and system combination, where the DNN models are variants of squeeze-excitation and residual networks. We conducted an ablation study of the effectiveness of each component on the ASVspoof 2019 corpus, and experimental results showed that ASSERT obtained more than 93% and 17% relative improvements over the baseline systems in the two sub-challenges in ASVspooof 2019, ranking ASSERT one of the top performing systems. Code and pretrained models will be made publicly available.


Key findings
ASSERT achieved substantial performance gains over baseline systems, obtaining more than 93% and 17% relative improvements in the Physical Access and Logical Access sub-challenges respectively. The fusion system ranked among the top performers in the ASVspoof 2019 challenge.
Approach
ASSERT employs a pipeline approach combining feature engineering (CQCC and logspec features), deep neural network models (variants of SENet and ResNet with statistical pooling), network optimization (Adam optimizer and multi-class/binary classification), and system combination (logistic regression fusion).
Datasets
ASVspoof 2019 corpus (Physical Access and Logical Access sub-challenges)
Model(s)
Variants of Squeeze-Excitation Networks (SENet34, SENet50), Mean-Std ResNet, Dilated ResNet, and Attentive-Filtering Network.
Author countries
USA