Speaker-Aware Anti-Spoofing

View on arXiv ← Back to list

Authors: Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen

Published: 2023-03-02 10:14:59+00:00

AI Summary

This paper introduces speaker-aware anti-spoofing, a voice spoofing countermeasure that uses prior knowledge of the target speaker. It extends the AASIST model by integrating target speaker information at the frame and utterance levels, achieving significant improvements in EER and t-DCF over a speaker-independent baseline.

Abstract

We address speaker-aware anti-spoofing, where prior knowledge of the target speaker is incorporated into a voice spoofing countermeasure (CM). In contrast to the frequently used speaker-independent solutions, we train the CM in a speaker-conditioned way. As a proof of concept, we consider speaker-aware extension to the state-of-the-art AASIST (audio anti-spoofing using integrated spectro-temporal graph attention networks) model. To this end, we consider two alternative strategies to incorporate target speaker information at the frame and utterance levels, respectively. The experimental results on a custom protocol based on ASVspoof 2019 dataset indicates the efficiency of the speaker information via enrollment: we obtain maximum relative improvements of 25.1% and 11.6% in equal error rate (EER) and minimum tandem detection cost function (t-DCF) over a speaker-independent baseline, respectively.

Key findings

Integrating speaker information at the spectral level achieved the best performance, outperforming the speaker-independent baseline by 25.1% in EER and 11.6% in minimum t-DCF. An ablation study showed that the model's performance degrades when the speaker identity is mismatched, but still outperforms the baseline in many cases.

Approach

The authors extend the AASIST audio anti-spoofing model by integrating speaker embeddings from enrollment data. Two strategies are explored: integrating speaker embeddings at the frame level (by extending the feature maps) and at the utterance level (by appending the embedding to the final feature vector).

Datasets

ASVspoof 2019 LA dataset, VoxCeleb, LibriSpeech

Model(s)

AASIST (Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks), ECAPA-TDNN (for speaker embedding extraction)

Author countries

Finland, France, India, Singapore

← Previous