Generalizable speech deepfake detection via meta-learned LoRA

Authors: Janne Laakkonen, Ivan Kukanov, Ville Hautamäki

Published: 2025-02-15 16:02:54+00:00

AI Summary

This paper proposes a novel approach for generalizable speech deepfake detection using meta-learning with Low-Rank Adaptation (LoRA) adapters. This method improves generalization by learning common structures across different deepfake attack types, reducing the need for extensive retraining when encountering new attacks.

Abstract

Generalizable deepfake detection can be formulated as a detection problem where labels (bonafide and fake) are fixed but distributional drift affects the deepfake set. We can always train our detector with one-selected attacks and bonafide data, but an attacker can generate new attacks by just retraining his generator with a different seed. One reasonable approach is to simply pool all different attack types available in training time. Our proposed approach is to utilize meta-learning in combination with LoRA adapters to learn the structure in the training data that is common to all attack types.


Key findings
The proposed method significantly improves generalization performance compared to baseline models, achieving comparable or better results with drastically fewer trainable parameters. The combination of LoRA and MLDG is crucial for effective zero-shot adaptation to unseen attacks.
Approach
The authors combine meta-learning domain generalization (MLDG) with LoRA adapters. LoRA efficiently fine-tunes a pre-trained Wav2Vec 2.0 model by updating only low-rank matrices, while MLDG trains the model to generalize across different attack types by simulating domain shifts during training.
Datasets
ASVspoof 2019 LA (train, dev, eval), ASVspoof 2021 LA eval, ASVspoof 2021 DF eval, InTheWild, FakeAVCeleb, ASVSpoof 5 LA eval
Model(s)
Wav2Vec 2.0 (XLSR-53) with AASIST back-end, LoRA adapters integrated into the Wav2Vec 2.0 encoder's self-attention modules.
Author countries
Finland, Singapore