Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection

View on arXiv ← Back to list

Authors: Janne Laakkonen, Ivan Kukanov, Ville Hautamäki

Published: 2025-09-17 10:13:58+00:00

AI Summary

This paper proposes a Mixture-of-Low-Rank-Adapter-Experts (MoE-LoRA) approach for generalizable audio deepfake detection. The method integrates multiple low-rank adapters into a Wav2Vec2 model, using a routing mechanism to selectively activate specialized adapters for improved adaptability to diverse deepfake attacks. Experimental results demonstrate that MoE-LoRA significantly outperforms standard fine-tuning in both in-domain and out-of-domain scenarios.

Abstract

Foundation models such as Wav2Vec2 excel at representation learning in speech tasks, including audio deepfake detection. However, after being fine-tuned on a fixed set of bonafide and spoofed audio clips, they often fail to generalize to novel deepfake methods not represented in training. To address this, we propose a mixture-of-LoRA-experts approach that integrates multiple low-rank adapters (LoRA) into the model's attention layers. A routing mechanism selectively activates specialized experts, enhancing adaptability to evolving deepfake attacks. Experimental results show that our method outperforms standard fine-tuning in both in-domain and out-of-domain scenarios, reducing equal error rates relative to baseline models. Notably, our best MoE-LoRA model lowers the average out-of-domain EER from 8.55\\% to 6.08\\%, demonstrating its effectiveness in achieving generalizable audio deepfake detection.

Key findings

The MoE-LoRA model significantly reduced the average out-of-domain Equal Error Rate (EER) from 8.55% to 6.08%, outperforming both standard fine-tuning and single-LoRA approaches. The method demonstrates improved generalization to unseen deepfake attacks and diverse acoustic conditions.

Approach

The authors address the generalization problem in audio deepfake detection by integrating multiple low-rank adapters (LoRA) into the attention layers of a Wav2Vec2 model. A routing mechanism selects a subset of these specialized 'expert' adapters based on the input audio, enhancing the model's ability to adapt to various deepfake techniques.

Datasets

ASVspoof 2019 LA (training, validation, evaluation), ASVspoof 2021 LA and DF, ASVspoof 5 LA, In-The-Wild, FakeAVCeleb

Model(s)

Wav2Vec2 XLSR-53, AASIST, Low-Rank Adapters (LoRA), Mixture-of-Experts (MoE) with LoRA

Author countries

Finland, Singapore

← Previous