Synthetic Voice Spoofing Detection Based On Online Hard Example Mining

Authors: Chenlei Hu, Ruohua Zhou

Published: 2022-09-23 13:32:15+00:00

AI Summary

This paper proposes an Online Hard Example Mining (OHEM) algorithm to improve the detection of unknown voice spoofing attacks in automatic speaker verification. By focusing on hard-to-classify samples, OHEM addresses class imbalance and achieves a low equal error rate (EER) of 0.77% on the ASVspoof 2019 Challenge.

Abstract

The automatic speaker verification spoofing (ASVspoof) challenge series is crucial for enhancing the spoofing consideration and the countermeasures growth. Although the recent ASVspoof 2019 validation results indicate the significant capability to identify most attacks, the model's recognition effect is still poor for some attacks. This paper presents the Online Hard Example Mining (OHEM) algorithm for detecting unknown voice spoofing attacks. The OHEM is utilized to overcome the imbalance between simple and hard samples in the dataset. The presented system provides an equal error rate (EER) of 0.77% on the ASVspoof 2019 Challenge logical access scenario's evaluation set.


Key findings
The OHEM algorithm significantly improved the EER across various models, with the best single model achieving an EER of 2.13%. A fusion of three models using OHEM yielded an EER of 0.77%, outperforming several existing systems on the ASVspoof 2019 LA evaluation set.
Approach
The authors address the class imbalance issue in voice spoofing detection datasets by using Online Hard Example Mining (OHEM). OHEM selectively trains on difficult-to-classify samples, improving the model's ability to identify unseen attacks. This is combined with various pre-trained models for feature extraction and classification.
Datasets
ASVspoof 2019 Logical Access (LA) database (training, development, and evaluation sets)
Model(s)
ResNet-18, ResNet-50, SE-res2net, Rawnet2, a novel Raw-res2net architecture (combination of Rawnet2 and Res2net)
Author countries
UNKNOWN