A Study On Data Augmentation In Voice Anti-Spoofing

View on arXiv ← Back to list

Authors: Ariel Cohen, Inbal Rimon, Eran Aflalo, Haim Permuter

Published: 2021-10-20 11:09:05+00:00

AI Summary

This paper investigates data augmentation techniques for improving synthetic audio detection in voice anti-spoofing. The authors propose novel data augmentation methods to address channel variability and unseen spoofing attacks, achieving state-of-the-art performance on the ASVspoof 2021 challenge.

Abstract

In this paper, we perform an in-depth study of how data augmentation techniques improve synthetic or spoofed audio detection. Specifically, we propose methods to deal with channel variability, different audio compressions, different band-widths, and unseen spoofing attacks, which have all been shown to significantly degrade the performance of audio-based systems and Anti-Spoofing systems. Our results are based on the ASVspoof 2021 challenge, in the Logical Access (LA) and Deep Fake (DF) categories. Our study is Data-Centric, meaning that the models are fixed and we significantly improve the results by making changes in the data. We introduce two forms of data augmentation - compression augmentation for the DF part, compression & channel augmentation for the LA part. In addition, a new type of online data augmentation, SpecAverage, is introduced in which the audio features are masked with their average value in order to improve generalization. Furthermore, we introduce a Log spectrogram feature design that improved the results. Our best single system and fusion scheme both achieve state-of-the-art performance in the DF category, with an EER of 15.46% and 14.46% respectively. Our best system for the LA task reduced the best baseline EER by 50% and the min t-DCF by 16%. Our techniques to deal with spoofed data from a wide variety of distributions can be replicated and can help anti-spoofing and speech-based systems enhance their results.

Key findings

The proposed data augmentation techniques, including SpecAverage and feature normalization, significantly improved the performance of the anti-spoofing systems. The best single system and fusion scheme achieved state-of-the-art results in the DeepFake category of the ASVspoof 2021 challenge, with EERs of 15.46% and 14.46%, respectively. The best system for the Logical Access task reduced the baseline EER by 50% and min t-DCF by 16%.

Approach

The authors employ a data-centric approach, focusing on augmenting the ASVspoof 2019 dataset with simulated compression, channel effects, and bandwidth variations. They introduce novel online augmentation techniques like SpecAverage and feature normalization to improve model generalization and robustness.

Datasets

ASVspoof 2019 (Logical Access and DeepFake categories), ASVspoof 2021 (evaluation data only)

Model(s)

ResNet-34, SEnet, One Class Softmax (OCS) ResNet

Author countries

Israel

← Previous