Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks

Authors: Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das

Published: 2024-04-26 09:36:49+00:00

AI Summary

This paper proposes novel audio features for replay speech attack detection in automatic speaker verification. These features, GFLC, GFDCC, and GFLDC, are derived using graph Fourier transform, logarithmic processing, and a device-related linear transformation, improving upon previous methods that ignored device and environmental noise effects. The proposed features outperform existing front-ends on multiple datasets.

Abstract

The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequency device cepstral coefficient, derived from the graph frequency domain using a device-related linear transformation. We also introduce two novel representations: graph frequency logarithmic coefficient and graph frequency logarithmic device coefficient. We evaluate our methods using traditional Gaussian mixture model and light convolutional neural network systems as classifiers. On the ASVspoof 2017 V2, ASVspoof 2019 physical access, and ASVspoof 2021 physical access datasets, our proposed features outperform known front-ends, demonstrating their effectiveness for replay speech detection.


Key findings
The proposed features (GFLC, GFDCC, and GFLDC) consistently outperform baseline and state-of-the-art methods on all three datasets. The incorporation of logarithmic processing and device information significantly improves detection accuracy. The performance is especially notable on the realistic ASVspoof 2021 dataset when trained on the ASVspoof 2017 dataset.
Approach
The authors propose three new audio features: GFLC (incorporating logarithmic processing), GFDCC, and GFLDC (both incorporating a device-related linear transformation). These features are evaluated using GMM and LCNN classifiers on several datasets. The device-related linear transformation uses parallel genuine and replay speech data, aligned with dynamic time warping, to learn parameters for separating device-related information.
Datasets
ASVspoof 2017 V2, ASVspoof 2019 physical access, and ASVspoof 2021 physical access datasets
Model(s)
Gaussian Mixture Model (GMM), Light Convolutional Neural Network (LCNN)
Author countries
China, Singapore