Introduction to Voice Presentation Attack Detection and Recent Advances

View on arXiv ← Back to list

Authors: Md Sahidullah, Hector Delgado, Massimiliano Todisco, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Kong-Aik Lee

Published: 2019-01-04 13:31:25+00:00

AI Summary

This research paper reviews recent advancements in voice presentation attack detection (PAD) for automatic speaker verification (ASV), focusing on studies from the last three years. It summarizes findings and lessons learned from two ASVspoof challenges, highlighting the continued need for generalized PAD solutions capable of detecting diverse spoofing attacks.

Abstract

Over the past few years significant progress has been made in the field of presentation attack detection (PAD) for automatic speaker recognition (ASV). This includes the development of new speech corpora, standard evaluation protocols and advancements in front-end feature extraction and back-end classifiers. The use of standard databases and evaluation protocols has enabled for the first time the meaningful benchmarking of different PAD solutions. This chapter summarises the progress, with a focus on studies completed in the last three years. The article presents a summary of findings and lessons learned from two ASVspoof challenges, the first community-led benchmarking efforts. These show that ASV PAD remains an unsolved problem and that further attention is required to develop generalised PAD solutions which have potential to detect diverse and previously unseen spoofing attacks.

Key findings

The ASVspoof challenges revealed that ASV PAD remains an open problem. Generalized solutions are needed to address diverse and unseen attacks. The use of multiple features and classifier fusion generally improves performance.

Approach

The paper surveys existing approaches to voice presentation attack detection, analyzing various spoofing techniques (impersonation, replay, speech synthesis, voice conversion) and their corresponding countermeasures. It highlights the use of standard databases and evaluation protocols from the ASVspoof challenges to benchmark different PAD solutions.

Datasets

ASVspoof 2015 and ASVspoof 2017 datasets, including SAS corpus (v1.0), and RedDots corpus.

Model(s)

Gaussian Mixture Models (GMMs), Support Vector Machines (SVMs), deep neural networks (DNNs), including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), i-vectors, and various fusion methods.

Author countries

Finland, France, Japan, United Kingdom

← Previous