Adversarial Transformation of Spoofing Attacks for Voice Biometrics

Authors: Alejandro Gomez-Alanis, Jose A. Gonzalez-Lopez, Antonio M. Peinado

Published: 2022-01-04 16:14:03+00:00

AI Summary

This paper introduces a novel Adversarial Biometrics Transformation Network (ABTN) to generate adversarial spoofing attacks against voice biometric systems. The ABTN jointly optimizes the loss functions of both the Presentation Attack Detection (PAD) and Automatic Speaker Verification (ASV) systems to create attacks that fool the PAD while remaining undetected by the ASV.

Abstract

Voice biometric systems based on automatic speaker verification (ASV) are exposed to textit{spoofing} attacks which may compromise their security. To increase the robustness against such attacks, anti-spoofing or presentation attack detection (PAD) systems have been proposed for the detection of replay, synthesis and voice conversion based attacks. Recently, the scientific community has shown that PAD systems are also vulnerable to adversarial attacks. However, to the best of our knowledge, no previous work have studied the robustness of full voice biometrics systems (ASV + PAD) to these new types of adversarial textit{spoofing} attacks. In this work, we develop a new adversarial biometrics transformation network (ABTN) which jointly processes the loss of the PAD and ASV systems in order to generate white-box and black-box adversarial textit{spoofing} attacks. The core idea of this system is to generate adversarial textit{spoofing} attacks which are able to fool the PAD system without being detected by the ASV system. The experiments were carried out on the ASVspoof 2019 corpus, including both logical access (LA) and physical access (PA) scenarios. The experimental results show that the proposed ABTN clearly outperforms some well-known adversarial techniques in both white-box and black-box attack scenarios.


Key findings
The ABTN significantly outperforms FGSM and PGD methods in both white-box and black-box attack scenarios, achieving higher EERjoint values. The generated attacks effectively fool the PAD system without impacting ASV performance, highlighting the vulnerability of voice biometric systems to adversarial attacks.
Approach
The authors propose an Adversarial Biometrics Transformation Network (ABTN) that transforms spoofing speech signals. The ABTN is trained to minimize a loss function that considers both the PAD and ASV system outputs, aiming to generate adversarial examples that fool the PAD while maintaining speaker information for ASV.
Datasets
ASVspoof 2019 (LA and PA scenarios), Voxceleb1
Model(s)
LCNN, SENet50, TDNN (x-vector extractor), PLDA, b-vector system, ABTN
Author countries
Spain