One-class Learning Towards Synthetic Voice Spoofing Detection

Authors: You Zhang, Fei Jiang, Zhiyao Duan

Published: 2020-10-27 02:13:35+00:00

AI Summary

This paper proposes a one-class learning approach for synthetic voice spoofing detection, focusing on unknown attacks. The method compacts bona fide speech representation and injects an angular margin to separate spoofing attacks in the embedding space, achieving a 2.19% equal error rate (EER) on the ASVspoof 2019 dataset, surpassing all existing single systems.

Abstract

Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech, and voice conversion. Recently, researchers developed anti-spoofing techniques to improve the reliability of ASV systems against spoofing attacks. However, most methods encounter difficulties in detecting unknown attacks in practical use, which often have different statistical distributions from known attacks. Especially, the fast development of synthetic voice spoofing algorithms is generating increasingly powerful attacks, putting the ASV systems at risk of unseen attacks. In this work, we propose an anti-spoofing system to detect unknown synthetic voice spoofing attacks (i.e., text-to-speech or voice conversion) using one-class learning. The key idea is to compact the bona fide speech representation and inject an angular margin to separate the spoofing attacks in the embedding space. Without resorting to any data augmentation methods, our proposed system achieves an equal error rate (EER) of 2.19% on the evaluation set of ASVspoof 2019 Challenge logical access scenario, outperforming all existing single systems (i.e., those without model ensemble).


Key findings
The proposed one-class learning approach significantly outperforms existing single systems (without model ensembles) on the ASVspoof 2019 LA dataset, achieving a 2.19% EER. The method demonstrates superior generalization to unseen attacks compared to traditional binary classification methods. The visualization of embeddings confirms the effectiveness of the proposed one-class softmax loss function in compacting bona fide speech and separating spoofing attacks.
Approach
The authors address the problem of detecting unknown synthetic voice spoofing attacks using one-class learning. They propose a novel one-class softmax loss function that compacts bona fide speech embeddings while maintaining a margin between genuine and spoofed speech. This approach avoids data augmentation and improves generalization to unseen attacks.
Datasets
ASVspoof 2019 Challenge logical access (LA) scenario dataset
Model(s)
ResNet-18 architecture (with attentive temporal pooling replacing global average pooling)
Author countries
USA, China