Advancing Continual Learning for Robust Deepfake Audio Classification

Authors: Feiyi Dong, Qingchen Tang, Yichen Bai, Zihan Wang

Published: 2024-07-14 07:32:24+00:00

AI Summary

This paper proposes CADE, a novel continual learning method for robust deepfake audio classification. CADE uses a fixed memory size to store past data, incorporates two distillation losses to retain old knowledge, and employs a novel embedding similarity loss for better positive sample alignment, outperforming baseline methods on the ASVspoof2019 dataset.

Abstract

The emergence of new spoofing attacks poses an increasing challenge to audio security. Current detection methods often falter when faced with unseen spoofing attacks. Traditional strategies, such as retraining with new data, are not always feasible due to extensive storage. This paper introduces a novel continual learning method Continual Audio Defense Enhancer (CADE). First, by utilizing a fixed memory size to store randomly selected samples from previous datasets, our approach conserves resources and adheres to privacy constraints. Additionally, we also apply two distillation losses in CADE. By distillation in classifiers, CADE ensures that the student model closely resembles that of the teacher model. This resemblance helps the model retain old information while facing unseen data. We further refine our model's performance with a novel embedding similarity loss that extends across multiple depth layers, facilitating superior positive sample alignment. Experiments conducted on the ASVspoof2019 dataset show that our proposed method outperforms the baseline methods.


Key findings
CADE consistently outperforms baseline methods (EWC, LWF, MAS, DFWF, and finetuning) across various experimental settings on the ASVspoof2019 dataset, achieving lower Equal Error Rates (EERs). The performance improvement is particularly noticeable when dealing with significantly different spoofing attacks. CADE demonstrates robustness even with limited memory sizes in the replay-based strategy.
Approach
CADE addresses catastrophic forgetting in audio deepfake detection using a continual learning approach. It combines replay-based learning with a fixed memory size for storing past data and employs two distillation losses (knowledge distillation and attention distillation) along with a novel embedding similarity loss to maintain performance on previously seen data while adapting to new unseen attacks.
Datasets
ASVspoof2019 dataset (Logical Access subset, specifically A1 to A6 spoofing techniques)
Model(s)
RawNet2 and LFCC-LCNN; The paper also mentions and compares against EWC, MAS, LWF, and DFWF continual learning methods.
Author countries
Australia