ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection

Authors: Massimiliano Todisco, Xin Wang, Ville Vestman, Md Sahidullah, Hector Delgado, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee

Published: 2019-04-09 15:12:52+00:00

AI Summary

The ASVspoof 2019 challenge focused on advancing countermeasures against spoofing attacks in automatic speaker verification (ASV). The challenge incorporated logical and physical access scenarios, various spoofing attack types, and a new tandem detection cost function (t-DCF) metric to assess system performance holistically.

Abstract

ASVspoof, now in its third edition, is a series of community-led challenges which promote the development of countermeasures to protect automatic speaker verification (ASV) from the threat of spoofing. Advances in the 2019 edition include: (i) a consideration of both logical access (LA) and physical access (PA) scenarios and the three major forms of spoofing attack, namely synthetic, converted and replayed speech; (ii) spoofing attacks generated with state-of-the-art neural acoustic and waveform models; (iii) an improved, controlled simulation of replay attacks; (iv) use of the tandem detection cost function (t-DCF) that reflects the impact of both spoofing and countermeasures upon ASV reliability. Even if ASV remains the core focus, in retaining the equal error rate (EER) as a secondary metric, ASYspoof also embraces the growing importance of fake audio detection. ASVspoof 2019 attracted the participation of 63 research teams, with more than half of these reporting systems that improve upon the performance of two baseline spoofing countermeasures. This paper describes the 2019 database, protocols and challenge results. It also outlines major findings which demonstrate the real progress made in protecting against the threat of spoofing and fake audio.


Key findings
Many teams improved upon baseline systems, demonstrating progress in spoofing detection. Results varied across attack types, highlighting the challenges posed by sophisticated techniques like end-to-end neural TTS. The t-DCF metric provided valuable insights into the combined performance of ASV systems and countermeasures.
Approach
The challenge used a tandem detection system, combining participant-developed spoofing countermeasures with a provided ASV system. Performance was evaluated using the t-DCF, a metric reflecting the combined impact of spoofing and countermeasures on ASV reliability, alongside the traditional EER.
Datasets
The VCTK corpus, partitioned into training, development, and evaluation sets for both logical access (LA) and physical access (PA) scenarios. LA included bona fide and spoofed speech from various TTS and VC systems; PA simulated replay attacks in reverberant environments.
Model(s)
Various models were used by participating teams, including neural networks and ensembles of classifiers. The provided ASV system used x-vector speaker embeddings and a PLDA backend.
Author countries
France, Japan, Finland, UK