Voice Spoofing Detection Corpus for Single and Multi-order Audio Replays

Authors: Roland Baumann, Khalid Mahmood Malik, Ali Javed, Andersen Ball, Brandon Kujawa, Hafiz Malik

Published: 2019-09-03 03:26:26+00:00

AI Summary

This paper introduces a novel voice spoofing detection corpus (VSDC) containing bona fide, first-order, and second-order replay audio samples, addressing the limitations of existing datasets that lack multi-order replay data and diverse recording conditions. VSDC is designed to evaluate anti-spoofing algorithms in multi-hop scenarios and includes audio from fifteen speakers recorded using various microphones and environments.

Abstract

The evolution of modern voice controlled devices (VCDs) in recent years has revolutionized the Internet of Things, and resulted in increased realization of smart homes, personalization and home automation through voice commands. The introduction of VCDs in IoT is expected to give emergence of new subfield of IoT, called Multimedia of Thing (MoT). These VCDs can be exploited in IoT driven environment to generate various spoofing attacks including the replays. Replay attacks are generated through replaying the recorded audio of legitimate human speaker with the intent of deceiving the VCDs having speaker verification system. The connectivity among the VCDs can easily be exploited in IoT driven environment to generate a chain of replay attacks (multi-order replay attacks). Existing spoofing detection datasets like ASVspoof and ReMASC contain only the first-order replay recordings against the bonafide audio samples. These datasets can not offer evaluation of the anti-spoofing algorithms capable of detecting the multi-order replay attacks. Additionally, these datasets do not capture the characteristics of microphone arrays, which is an important characteristic of modern VCDs. We need a diverse replay spoofing detection corpus that consists of multi-order replay recordings against the bonafide voice samples. This paper presents a novel voice spoofing detection corpus (VSDC) to evaluate the performance of multi-order replay anti-spoofing methods. The proposed VSDC consists of first and second-order-replay samples against the bonafide audio recordings. Additionally, the proposed VSDC can also be used to evaluate the performance of speaker verification systems as our corpus includes the audio samples of fifteen different speakers. To the best of our knowledge, this is the first publicly available replay spoofing detection corpus comprising of first-order and second-order-replay samples.


Key findings
Existing anti-spoofing models performed significantly worse on the VSDC dataset than on ASVspoof datasets, highlighting the need for more diverse training data. Training models on a combination of ASVspoof and VSDC improved performance, demonstrating the value of VSDC. Second-order replay attacks proved easier to detect than first-order attacks using the ASV baseline model.
Approach
The authors created a new dataset (VSDC) by recording bona fide audio, then replaying it through various devices and scenarios to create first and second-order replay samples. This dataset is designed to be more diverse than existing datasets, including variations in recording environments, microphones, and replay chains.
Datasets
ASVspoof 2017, ASVspoof 2019, and the newly created VSDC dataset.
Model(s)
CQCC-GMM (Constant Q cepstral coefficients and Gaussian Mixture Model)
Author countries
USA