Rehearsal with Auxiliary-Informed Sampling for Audio Deepfake Detection

Authors: Falih Gozi Febrinanto, Kristen Moore, Chandra Thapa, Jiangang Ma, Vidya Saikrishna, Feng Xia

Published: 2025-05-30 11:40:50+00:00

AI Summary

This paper introduces Rehearsal with Auxiliary-Informed Sampling (RAIS), a continual learning approach for audio deepfake detection that addresses the challenge of catastrophic forgetting. RAIS uses an auxiliary label generation network to improve sample diversity in the memory buffer, leading to better performance in handling new deepfake attacks.

Abstract

The performance of existing audio deepfake detection frameworks degrades when confronted with new deepfake attacks. Rehearsal-based continual learning (CL), which updates models using a limited set of old data samples, helps preserve prior knowledge while incorporating new information. However, existing rehearsal techniques don't effectively capture the diversity of audio characteristics, introducing bias and increasing the risk of forgetting. To address this challenge, we propose Rehearsal with Auxiliary-Informed Sampling (RAIS), a rehearsal-based CL approach for audio deepfake detection. RAIS employs a label generation network to produce auxiliary labels, guiding diverse sample selection for the memory buffer. Extensive experiments show RAIS outperforms state-of-the-art methods, achieving an average Equal Error Rate (EER) of 1.953 % across five experiences. The code is available at: https://github.com/falihgoz/RAIS.


Key findings
RAIS outperforms state-of-the-art continual learning methods for audio deepfake detection, achieving an average Equal Error Rate (EER) of 1.953% across five experiences. Ablation studies confirm the importance of both the auxiliary label generation and the auxiliary-informed sampling components. The method shows minimal forgetting of previously learned knowledge.
Approach
RAIS employs a rehearsal-based continual learning strategy. It uses an Audio Auxiliary Label Generation Module (AAGM) to generate auxiliary labels, which guide a novel Auxiliary-Informed Sampling (AIS) method for selecting diverse and informative samples for the memory buffer. This helps maintain prior knowledge while adapting to new deepfake attacks.
Datasets
ASVspoof 2019 LA, VCC 2020, InTheWild, CFAD, and a dataset generated using the OpenAI TTS API with LJSpeech transcripts.
Model(s)
Wav2Vec2 (wav2vec2-xls-r-300m) for feature extraction, AASIST for feature extraction, and a custom model with two linear layers for classification and auxiliary label generation.
Author countries
Australia