Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model

Authors: Massimiliano Todisco, Michele Panariello, Xin Wang, Héctor Delgado, Kong Aik Lee, Nicholas Evans

Published: 2024-08-17 21:58:11+00:00

AI Summary

Malacopula, a neural-based generalized Hammerstein model, generates adversarial perturbations for spoofed speech to deceive automatic speaker verification (ASV) systems. It enhances spoofing attacks by using non-linear processes to modify speech, minimizing the cosine distance between spoofed and bona fide speaker embeddings. Experiments show substantial vulnerability increases, though speech quality degrades and attacks are detectable under controlled conditions.

Abstract

We present Malacopula, a neural-based generalised Hammerstein model designed to introduce adversarial perturbations to spoofed speech utterances so that they better deceive automatic speaker verification (ASV) systems. Using non-linear processes to modify speech utterances, Malacopula enhances the effectiveness of spoofing attacks. The model comprises parallel branches of polynomial functions followed by linear time-invariant filters. The adversarial optimisation procedure acts to minimise the cosine distance between speaker embeddings extracted from spoofed and bona fide utterances. Experiments, performed using three recent ASV systems and the ASVspoof 2019 dataset, show that Malacopula increases vulnerabilities by a substantial margin. However, speech quality is reduced and attacks can be detected effectively under controlled conditions. The findings emphasise the need to identify new vulnerabilities and design defences to protect ASV systems from adversarial attacks in the wild.


Key findings
Malacopula significantly increased the vulnerability of three ASV systems to spoofing attacks. However, the attacks reduced speech quality and were detectable by a spoofing detection system (ASSIST) under controlled conditions. This highlights the need for improved ASV defenses and further research into detection in real-world scenarios.
Approach
Malacopula uses a generalized Hammerstein model with parallel branches of polynomial functions followed by linear time-invariant filters to modify spoofed speech. An adversarial optimization procedure minimizes the cosine distance between speaker embeddings from modified spoofed and bona fide utterances. This aims to make spoofed speech indistinguishable from genuine speech to the ASV system.
Datasets
ASVspoof 2019 logical access (LA) dataset
Model(s)
Generalized Hammerstein model; CAM++, ECAPA, ERes2Net ASV systems for evaluation; Adam optimizer.
Author countries
France, Japan, Spain, Hong Kong