MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection
Authors: Zihan Pan, Sailor Hardik Bhupendra, Jinyang Wu
Published: 2025-09-11 06:18:29+00:00
AI Summary
This paper introduces MoLEx, a parameter-efficient framework for audio deepfake detection that combines Low-Rank Adaptation (LoRA) with a Mixture-of-Experts (MoE) router. MoLEx efficiently finetunes pre-trained self-supervised learning (SSL) models by only updating selected experts, achieving state-of-the-art performance with reduced computational costs.
Abstract
While self-supervised learning (SSL)-based models have boosted audio deepfake detection accuracy, fully finetuning them is computationally expensive. To address this, we propose a parameter-efficient framework that combines Low-Rank Adaptation with a Mixture-of-Experts router, called Mixture of LoRA Experts (MoLEx). It preserves pre-trained knowledge of SSL models while efficiently finetuning only selected experts, reducing training costs while maintaining robust performance. The observed utility of experts during inference shows the router reactivates the same experts for similar attacks but switches to other experts for novel spoofs, confirming MoLEx's domain-aware adaptability. MoLEx additionally offers flexibility for domain adaptation by allowing extra experts to be trained without modifying the entire model. We mainly evaluate our approach on the ASVSpoof 5 dataset and achieve the state-of-the-art (SOTA) equal error rate (EER) of 5.56% on the evaluation set without augmentation.