Synthetic speech detection using meta-learning with prototypical loss

View on arXiv ← Back to list

Authors: Monisankha Pal, Aditya Raikar, Ashish Panda, Sunil Kumar Kopparapu

Published: 2022-01-24 06:01:06+00:00

AI Summary

This research addresses the generalization problem in synthetic speech detection by employing prototypical loss under a meta-learning paradigm. This approach learns an embedding space that effectively distinguishes between genuine and synthetic speech, improving performance on unseen spoofing attacks.

Abstract

Recent works on speech spoofing countermeasures still lack generalization ability to unseen spoofing attacks. This is one of the key issues of ASVspoof challenges especially with the rapid development of diverse and high-quality spoofing algorithms. In this work, we address the generalizability of spoofing detection by proposing prototypical loss under the meta-learning paradigm to mimic the unseen test scenario during training. Prototypical loss with metric-learning objectives can learn the embedding space directly and emerges as a strong alternative to prevailing classification loss functions. We propose an anti-spoofing system based on squeeze-excitation Residual network (SE-ResNet) architecture with prototypical loss. We demonstrate that the proposed single system without any data augmentation can achieve competitive performance to the recent best anti-spoofing systems on ASVspoof 2019 logical access (LA) task. Furthermore, the proposed system with data augmentation outperforms the ASVspoof 2021 challenge best baseline both in the progress and evaluation phase of the LA task. On ASVspoof 2019 and 2021 evaluation set LA scenario, we attain a relative 68.4% and 3.6% improvement in min-tDCF compared to the challenge best baselines, respectively.

Key findings

The proposed system, without data augmentation, achieved competitive performance compared to state-of-the-art systems on ASVspoof 2019. With data augmentation, it outperformed the ASVspoof 2021 challenge baseline, showing significant improvements in min-tDCF (68.4% and 3.6% relative improvement on ASVspoof 2019 and 2021, respectively).

Approach

The authors propose an anti-spoofing system using a squeeze-excitation ResNet (SE-ResNet) architecture trained with prototypical loss. This meta-learning approach mimics unseen test scenarios during training, enhancing generalization to diverse spoofing attacks.

Datasets

ASVspoof 2019 and ASVspoof 2021 logical access (LA) tasks.

Model(s)

SE-ResNet34 architecture.

Author countries

India

← Previous