DNN Filter Bank Cepstral Coefficients for Spoofing Detection

Authors: Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo

Published: 2017-02-13 14:44:17+00:00

AI Summary

This paper introduces Deep Neural Network Filter Bank Cepstral Coefficients (DNN-FBCC) for distinguishing between natural and spoofed speech, aiming to improve automatic speaker verification system reliability. The DNN filter bank is automatically generated by training a Filter Bank Neural Network (FBNN) using natural and synthetic speech, with restrictions to create band-limited, frequency-sorted filters. Experimental results on the ASVspoof 2015 database demonstrate that a Gaussian Mixture Model maximum-likelihood (GMM-ML) classifier using DNN-FBCC outperforms state-of-the-art LFCC, particularly in detecting unknown attacks.

Abstract

With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof {2015} database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks.


Key findings
The proposed DNN-FBCC features, particularly DNN-IGFCC(∆∆2), achieved superior spoofing detection performance compared to manually designed Cep features (LFCC, RFCC, GFCC, IGFCC) and other data-driven features. The learned filter banks, enhanced with suitable band-limiting and shape restrictions, proved more effective in capturing discriminative characteristics between natural and synthetic speech. This approach significantly improved accuracy, especially in detecting unknown spoofing attacks on the ASVspoof 2015 database.
Approach
The authors propose Deep Neural Network Filter Bank Cepstral Coefficients (DNN-FBCC) by training a Filter Bank Neural Network (FBNN) on natural and synthetic speech. Restrictions are applied to the FBNN's weight matrix during training to ensure the learned filters are non-negative, band-limited, and frequency-ordered. These learned filter banks are then used for cepstral analysis to extract DNN-FBCC features, which are subsequently fed into a Gaussian Mixture Model maximum-likelihood (GMM-ML) classifier for spoofing detection.
Datasets
ASVspoof 2015 database
Model(s)
Filter Bank Neural Network (FBNN), Gaussian Mixture Model Maximum-Likelihood (GMM-ML) classifier
Author countries
China, Denmark