Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes

Authors: Ivan Kukanov, Janne Laakkonen, Tomi Kinnunen, Ville Hautamäki

Published: 2024-10-27 20:14:32+00:00

AI Summary

This paper tackles the challenge of generalizing speech deepfake detection to unseen attacks using meta-learning. By learning attack-invariant features, the approach adapts to new attacks with minimal samples, improving Equal Error Rate (EER) from 21.67% to 10.42% on the InTheWild dataset using only 96 unseen samples.

Abstract

Current speech deepfake detection approaches perform satisfactorily against known adversaries; however, generalization to unseen attacks remains an open challenge. The proliferation of speech deepfakes on social media underscores the need for systems that can generalize to unseen attacks not observed during training. We address this problem from the perspective of meta-learning, aiming to learn attack-invariant features to adapt to unseen attacks with very few samples available. This approach is promising since generating of a high-scale training dataset is often expensive or infeasible. Our experiments demonstrated an improvement in the Equal Error Rate (EER) from 21.67% to 10.42% on the InTheWild dataset, using just 96 samples from the unseen dataset. Continuous few-shot adaptation ensures that the system remains up-to-date.


Key findings
Meta-learning approaches, particularly ProtoMAML, significantly improved the detection of unseen speech deepfakes. Using only 96 samples from the unseen dataset, the EER was reduced substantially. The ProtoMAML model showed better adaptation capabilities than ProtoNet, though at a higher computational cost.
Approach
The authors employ meta-learning, specifically ProtoNet and ProtoMAML, to learn attack-invariant features. These models adapt to unseen speech deepfakes using a few-shot learning approach, requiring only a small number of samples from the unseen attack to update the model parameters.
Datasets
ASVspoof2019 LA, ASVspoof2021 LA, ASVspoof2021 DF, InTheWild, FakeAVCeleb
Model(s)
Wav2Vec-AASIST (a self-supervised learning model with a graph attention network backend), ProtoNet, ProtoMAML
Author countries
Singapore, Finland