Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection
Authors: Janne Laakkonen, Ivan Kukanov, Ville Hautamäki
Published: 2025-09-17 10:13:58+00:00
AI Summary
This paper proposes a Mixture-of-Low-Rank-Adapter-Experts (MoE-LoRA) approach for generalizable audio deepfake detection. The method integrates multiple low-rank adapters into a Wav2Vec2 model, using a routing mechanism to selectively activate specialized adapters for improved adaptability to diverse deepfake attacks. Experimental results demonstrate that MoE-LoRA significantly outperforms standard fine-tuning in both in-domain and out-of-domain scenarios.
Abstract
Foundation models such as Wav2Vec2 excel at representation learning in speech tasks, including audio deepfake detection. However, after being fine-tuned on a fixed set of bonafide and spoofed audio clips, they often fail to generalize to novel deepfake methods not represented in training. To address this, we propose a mixture-of-LoRA-experts approach that integrates multiple low-rank adapters (LoRA) into the model's attention layers. A routing mechanism selectively activates specialized experts, enhancing adaptability to evolving deepfake attacks. Experimental results show that our method outperforms standard fine-tuning in both in-domain and out-of-domain scenarios, reducing equal error rates relative to baseline models. Notably, our best MoE-LoRA model lowers the average out-of-domain EER from 8.55\\% to 6.08\\%, demonstrating its effectiveness in achieving generalizable audio deepfake detection.