Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion

View on arXiv ← Back to list

Authors: Xiaohai Tian, Rohan Kumar Das, Haizhou Li

Published: 2019-09-17 08:54:17+00:00

AI Summary

This paper proposes a feedback-controlled voice conversion (VC) framework for black-box attacks on automatic speaker verification (ASV) systems. The framework uses ASV system output scores as feedback to optimize the VC system, generating adversarial samples more deceptive than standard VC methods while maintaining good perceptual quality.

Abstract

Automatic speaker verification (ASV) systems in practice are greatly vulnerable to spoofing attacks. The latest voice conversion technologies are able to produce perceptually natural sounding speech that mimics any target speakers. However, the perceptual closeness to a speaker's identity may not be enough to deceive an ASV system. In this work, we propose a framework that uses the output scores of an ASV system as the feedback to a voice conversion system. The attacker framework is a black-box adversary that steals one's voice identity, because it does not require any knowledge about the ASV system but the system outputs. Experimental results conducted on ASVspoof 2019 database confirm that the proposed feedback-controlled voice conversion framework produces adversarial samples that are more deceptive than the straightforward voice conversion, thereby boosting the impostor ASV scores. Further, the perceptual evaluation studies reveal that converted speech does not adversely affect the voice quality from the baseline system.

Key findings

The feedback-controlled VC system significantly increased the ASV scores of converted speech, making it more effective for spoofing attacks. Perceptual evaluation showed no significant difference in quality or similarity between the feedback-controlled and baseline VC systems.

Approach

The authors propose a black-box attack using a feedback loop between a voice conversion system and an ASV system. The ASV system's output scores are used as feedback to train the VC system, aiming to maximize the ASV scores of converted speech. This approach does not require knowledge of the ASV system's internal workings.

Datasets

ASVspoof 2019 corpus (logical access subset)

Model(s)

PPG-based voice conversion (PPG-VC) with and without ASV feedback (PPG-VC-FC); i-vector based ASV system.

Author countries

Singapore, China

← Previous