Replay Spoofing Countermeasure Using Autoencoder and Siamese Network on ASVspoof 2019 Challenge
Authors: Mohammad Adiban, Hossein Sameti, Saeedreza Shehnepoor
Published: 2019-10-29 16:03:04+00:00
AI Summary
This paper presents a novel replay spoofing countermeasure for Automatic Speaker Verification (ASV) systems. It uses Constant Q Cepstral Coefficients (CQCC) features processed by an autoencoder to enhance information and incorporate noise information, followed by a Siamese network for classification, achieving significant improvements over the baseline system.
Abstract
Automatic Speaker Verification (ASV) is the process of identifying a person based on the voice presented to a system. Different synthetic approaches allow spoofing to deceive ASV systems (ASVs), whether using techniques to imitate a voice or recunstruct the features. Attackers try to beat up the ASVs using four general techniques; impersonation, speech synthesis, voice conversion, and replay. The last technique is considered as a common and high potential tool for spoofing purposes since replay attacks are more accessible and require no technical knowledge from adversaries. In this study, we introduce a novel replay spoofing countermeasure for ASVs. Accordingly, we used the Constant Q Cepstral Coefficient (CQCC) features fed into an autoencoder to attain more informative features and to consider the noise information of spoofed utterances for discrimination purpose. Finally, different configurations of the Siamese network were used for the first time in this context for classification. The experiments performed on ASVspoof challenge 2019 dataset using Equal Error Rate (EER) and Tandem Detection Cost Function (t-DCF) as evaluation metrics show that the proposed system improved the results over the baseline by 10.73% and 0.2344 in terms of EER and t-DCF, respectively.