Novel Speech Features for Improved Detection of Spoofing Attacks

Authors: Dipjyoti Paul, Monisankha Pal, Goutam Saha

Published: 2016-03-14 13:49:18+00:00

AI Summary

This paper proposes novel speech features for improved detection of spoofing attacks in automatic speaker verification systems. These features leverage alternative frequency warping and formant-specific block transformation of filter bank log energies, significantly outperforming existing methods.

Abstract

Now-a-days, speech-based biometric systems such as automatic speaker verification (ASV) are highly prone to spoofing attacks by an imposture. With recent development in various voice conversion (VC) and speech synthesis (SS) algorithms, these spoofing attacks can pose a serious potential threat to the current state-of-the-art ASV systems. To impede such attacks and enhance the security of the ASV systems, the development of efficient anti-spoofing algorithms is essential that can differentiate synthetic or converted speech from natural or human speech. In this paper, we propose a set of novel speech features for detecting spoofing attacks. The proposed features are computed using alternative frequency-warping technique and formant-specific block transformation of filter bank log energies. We have evaluated existing and proposed features against several kinds of synthetic speech data from ASVspoof 2015 corpora. The results show that the proposed techniques outperform existing approaches for various spoofing attack detection task. The techniques investigated in this paper can also accurately classify natural and synthetic speech as equal error rates (EERs) of 0% have been achieved.


Key findings
The proposed features significantly outperform existing approaches, achieving 0% equal error rate (EER) in classifying natural and synthetic speech for several spoofing attacks in the ASVspoof 2015 development set. The use of inverted frequency warping and dynamic features proved particularly effective.
Approach
The authors propose new audio features computed using an inverted frequency warping technique and formant-specific block transformations of filter bank log energies. These features are then used with a Gaussian Mixture Model (GMM) classifier to distinguish between real and synthetic speech.
Datasets
ASVspoof 2015 corpus
Model(s)
Gaussian Mixture Model (GMM) classifier
Author countries
India