Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge

Authors: Hossein Zeinali, Themos Stafylakis, Georgia Athanasopoulou, Johan Rohdin, Ioannis Gkinis, Lukáš Burget, Jan "Honza Černocký

Published: 2019-07-13 17:27:40+00:00

AI Summary

This paper describes the BUT-Omilia system for the ASVspoof 2019 challenge, focusing on detecting spoofing attacks in speaker verification. For physical access (PA), a fusion of two VGG networks is used, while for logical access (LA), a fusion of VGG and SincNet is employed. The PA system showed significant improvement over the baseline, while the LA system struggled to generalize to unseen attacks.

Abstract

In this paper, we present the system description of the joint efforts of Brno University of Technology (BUT) and Omilia -- Conversational Intelligence for the ASVSpoof2019 Spoofing and Countermeasures Challenge. The primary submission for Physical access (PA) is a fusion of two VGG networks, trained on single and two-channels features. For Logical access (LA), our primary system is a fusion of VGG and the recently introduced SincNet architecture. The results on PA show that the proposed networks yield very competitive performance in all conditions and achieved 86:% relative improvement compared to the official baseline. On the other hand, the results on LA showed that although the proposed architecture and training strategy performs very well on certain spoofing attacks, it fails to generalize to certain attacks that are unseen during training.


Key findings
The VGG-based system for physical access achieved a significant 86% relative improvement over the baseline. The logical access system, using a fusion of VGG and SincNet, performed well on seen attacks but failed to generalize to unseen attacks. The results highlight the challenge of generalizing deep learning models for anti-spoofing in scenarios with significant mismatch between training and testing data.
Approach
The authors use a fusion of deep neural networks (DNNs), specifically VGG and SincNet, for spoofing detection. For physical access, two VGG networks are trained on different audio features and their outputs are fused. For logical access, a VGG network and two SincNet networks (differing in dropout rates) are fused.
Datasets
ASVspoof 2019 challenge dataset.
Model(s)
VGG, SincNet
Author countries
Czechia, Greece