Novel Speech Features for Improved Detection of Spoofing Attacks

Authors: Dipjyoti Paul, Monisankha Pal, Goutam Saha

Published: 2016-03-14 13:49:18+00:00

Comment: Presented in IEEE 2015 Annual IEEE India Conference (INDICON)

AI Summary

This paper proposes novel speech features to improve the detection of spoofing attacks against Automatic Speaker Verification (ASV) systems. These features are derived using an alternative frequency-warping technique and formant-specific block transformation of filter bank log energies. Evaluated on the ASVspoof 2015 corpora, the proposed techniques outperform existing methods, achieving 0% Equal Error Rate (EER) for natural and synthetic speech classification.

Abstract

Now-a-days, speech-based biometric systems such as automatic speaker verification (ASV) are highly prone to spoofing attacks by an imposture. With recent development in various voice conversion (VC) and speech synthesis (SS) algorithms, these spoofing attacks can pose a serious potential threat to the current state-of-the-art ASV systems. To impede such attacks and enhance the security of the ASV systems, the development of efficient anti-spoofing algorithms is essential that can differentiate synthetic or converted speech from natural or human speech. In this paper, we propose a set of novel speech features for detecting spoofing attacks. The proposed features are computed using alternative frequency-warping technique and formant-specific block transformation of filter bank log energies. We have evaluated existing and proposed features against several kinds of synthetic speech data from ASVspoof 2015 corpora. The results show that the proposed techniques outperform existing approaches for various spoofing attack detection task. The techniques investigated in this paper can also accurately classify natural and synthetic speech as equal error rates (EERs) of 0% have been achieved.


Key findings
The proposed inverted-scale features, particularly ISOBT with dynamic coefficients (∆∆2), achieved a 0% EER, significantly outperforming conventional features in detecting spoofing attacks. The study highlights the importance of high-frequency components, dynamic feature coefficients, block-based transformations, and speech-signal-based frequency warping for effective anti-spoofing countermeasures. These elements provide more discriminative information to distinguish between natural and synthetic speech.
Approach
The authors propose novel 'inverted-scale' cepstral features (IMFCC, IMOBT, ISFCC, ISOBT) to capture high-frequency information, which is often distorted in synthetic speech. These features are computed using alternative frequency-warping techniques (inverted mel-frequency scale, speech-signal-based warping) and formant-specific block transformations of filter bank log energies. A GMM-ML classifier is then utilized with these features to discriminate between natural and synthetic speech.
Datasets
ASVspoof 2015 corpus
Model(s)
Gaussian Mixture Model - Maximum Likelihood (GMM-ML) classifier
Author countries
India