DNN Filter Bank Cepstral Coefficients for Spoofing Detection

View on arXiv ← Back to list

Authors: Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo

Published: 2017-02-13 14:44:17+00:00

AI Summary

This paper proposes DNN-FBCC, a new filter bank based cepstral feature for spoofing detection in speaker verification systems. A filter bank neural network (FBNN) automatically learns filter banks from natural and synthetic speech, outperforming manually designed filter banks and improving detection of unknown attacks.

Abstract

With the development of speech synthesis techniques, automatic speaker verification systems face the serious challenge of spoofing attack. In order to improve the reliability of speaker verification systems, we develop a new filter bank based cepstral feature, deep neural network filter bank cepstral coefficients (DNN-FBCC), to distinguish between natural and spoofed speech. The deep neural network filter bank is automatically generated by training a filter bank neural network (FBNN) using natural and synthetic speech. By adding restrictions on the training rules, the learned weight matrix of FBNN is band-limited and sorted by frequency, similar to the normal filter bank. Unlike the manually designed filter bank, the learned filter bank has different filter shapes in different channels, which can capture the differences between natural and synthetic speech more effectively. The experimental results on the ASVspoof {2015} database show that the Gaussian mixture model maximum-likelihood (GMM-ML) classifier trained by the new feature performs better than the state-of-the-art linear frequency cepstral coefficients (LFCC) based classifier, especially on detecting unknown attacks.

Key findings

DNN-FBCC features, particularly those using an inverted Gammatone filter bank, significantly outperform manually designed cepstral features and other data-driven features in detecting both known and unknown spoofing attacks on the ASVspoof 2015 dataset. The learned filter banks show flexibility and effectiveness in capturing differences between natural and synthetic speech.

Approach

The authors propose a deep neural network filter bank (DNN-FBCC) for improved spoofing detection. A filter bank neural network (FBNN) is trained on natural and synthetic speech, learning filter shapes optimized for distinguishing between them. Cepstral coefficients are then extracted from the learned filter bank outputs.

Datasets

ASVspoof 2015 database (training, development, and evaluation sets)

Model(s)

Filter Bank Neural Network (FBNN) for feature extraction; Gaussian Mixture Model (GMM) for classification.

Author countries

China, Denmark

← Previous