ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection

Authors: Massimiliano Todisco, Xin Wang, Ville Vestman, Md Sahidullah, Hector Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee

Published: 2019-04-09 15:12:52+00:00

Journal Ref: Proc. Interspeech 2019

AI Summary

The ASVspoof 2019 challenge introduced new databases and protocols to benchmark countermeasures against advanced spoofing attacks in automatic speaker verification (ASV). It considered both logical access (LA) and physical access (PA) scenarios with synthetic, converted, and replayed speech attacks, adopting the ASV-centric tandem detection cost function (t-DCF) as the primary evaluation metric. The challenge showcased significant progress in spoofed and fake audio detection, with over half of the 63 participating teams outperforming the provided baseline countermeasures.

Abstract

ASVspoof, now in its third edition, is a series of community-led challenges which promote the development of countermeasures to protect automatic speaker verification (ASV) from the threat of spoofing. Advances in the 2019 edition include: (i) a consideration of both logical access (LA) and physical access (PA) scenarios and the three major forms of spoofing attack, namely synthetic, converted and replayed speech; (ii) spoofing attacks generated with state-of-the-art neural acoustic and waveform models; (iii) an improved, controlled simulation of replay attacks; (iv) use of the tandem detection cost function (t-DCF) that reflects the impact of both spoofing and countermeasures upon ASV reliability. Even if ASV remains the core focus, in retaining the equal error rate (EER) as a secondary metric, ASYspoof also embraces the growing importance of fake audio detection. ASVspoof 2019 attracted the participation of 63 research teams, with more than half of these reporting systems that improve upon the performance of two baseline spoofing countermeasures. This paper describes the 2019 database, protocols and challenge results. It also outlines major findings which demonstrate the real progress made in protecting against the threat of spoofing and fake audio.


Key findings
Many participant systems significantly outperformed the baseline countermeasures, achieving very low t-DCF and EER scores, demonstrating real progress in spoofing detection. Top-performing systems frequently utilized neural networks and classifier ensembles. The new t-DCF metric revealed nuanced performance, highlighting that while some advanced neural waveform model attacks strongly degrade ASV performance, effective countermeasures can be developed, especially through fusion strategies for diverse attack families.
Approach
This paper describes the ASVspoof 2019 challenge, which involved creating new databases for logical access (synthetic and converted speech) and physical access (replayed speech) scenarios, with attacks generated using state-of-the-art neural acoustic and waveform models. It established evaluation protocols, including the t-DCF metric for assessing the joint impact of spoofing and countermeasures on ASV systems, and provided two baseline countermeasures for participants to improve upon.
Datasets
ASVspoof 2019 database (derived from VCTK base corpus)
Model(s)
Common ASV system: x-vector speaker embeddings with PLDA backend. Baseline countermeasures: Gaussian Mixture Models (GMM) with Constant Q Cepstral Coefficient (CQCC) features (B01) and Linear Frequency Cepstral Coefficient (LFCC) features (B02). Top-performing participant systems primarily leveraged neural networks and ensemble classifiers.
Author countries
France, Japan, Finland, UK