Thech. Report: Genuinization of Speech waveform PMF for speaker detection spoofing and countermeasures

Authors: Itshak Lapidot, Jean-Francois Bonastre

Published: 2023-10-09 08:56:31+00:00

AI Summary

This tech report investigates the impact of waveform probability mass function (PMF) on speaker spoofing detection. It proposes a 'genuinization' algorithm to reduce the PMF distribution gap between genuine and spoofed speech, improving spoofing detection performance.

Abstract

In the context of spoofing attacks in speaker recognition systems, we observed that the waveform probability mass function (PMF) of genuine speech differs significantly from the PMF of speech resulting from the attacks. This is true for synthesized or converted speech as well as replayed speech. We also noticed that this observation seems to have a significant impact on spoofing detection performance. In this article, we propose an algorithm, denoted genuinization, capable of reducing the waveform distribution gap between authentic speech and spoofing speech. Our genuinization algorithm is evaluated on ASVspoof 2019 challenge datasets, using the baseline system provided by the challenge organization. We first assess the influence of genuinization on spoofing performance. Using genuinization for the spoofing attacks degrades spoofing detection performance by up to a factor of 10. Next, we integrate the genuinization algorithm in the spoofing countermeasures and we observe a huge spoofing detection improvement in different cases. The results of our experiments show clearly that waveform distribution plays an important role and must be taken into account by anti-spoofing systems.


Key findings
Genuinization significantly degrades spoofing performance when applied to attacks. Integrating genuinization into countermeasures substantially improves spoofing detection. The low-energy parts of the audio signal, including non-speech portions, significantly impact countermeasure performance.
Approach
The authors propose a genuinization algorithm that modifies the waveform PMF of spoofed speech to resemble that of genuine speech. This is achieved through quantile normalization, adapting a method initially designed for continuous random variables to handle discrete audio samples.
Datasets
ASVspoof 2019 challenge datasets (Logical Access and Physical Access)
Model(s)
Gaussian Mixture Model (GMM)
Author countries
Israel, France