Region-Based Optimization in Continual Learning for Audio Deepfake Detection

Authors: Yujie Chen, Jiangyan Yi, Cunhang Fan, Jianhua Tao, Yong Ren, Siding Zeng, Chu Yuan Zhang, Xinrui Yan, Hao Gu, Jun Xue, Chenglong Wang, Zhao Lv, Xiaohui Zhang

Published: 2024-12-16 08:34:09+00:00

Comment: Accepted by AAAI 2025

AI Summary

The paper introduces Region-Based Optimization (RegO), a novel continual learning method designed for audio deepfake detection, addressing the challenge of model performance degradation when encountering diverse and evolving deepfakes. RegO employs the Fisher information matrix to identify and categorize neuron regions, applying adaptive gradient optimization strategies, complemented by an Ebbinghaus forgetting mechanism to manage redundant neurons. This approach significantly outperforms state-of-the-art continual learning methods, achieving a 21.3% improvement in EER for audio deepfake detection, while also demonstrating generalizability to other domains like image recognition.

Abstract

Rapid advancements in speech synthesis and voice conversion bring convenience but also new security risks, creating an urgent need for effective audio deepfake detection. Although current models perform well, their effectiveness diminishes when confronted with the diverse and evolving nature of real-world deepfakes. To address this issue, we propose a continual learning method named Region-Based Optimization (RegO) for audio deepfake detection. Specifically, we use the Fisher information matrix to measure important neuron regions for real and fake audio detection, dividing them into four regions. First, we directly fine-tune the less important regions to quickly adapt to new tasks. Next, we apply gradient optimization in parallel for regions important only to real audio detection, and in orthogonal directions for regions important only to fake audio detection. For regions that are important to both, we use sample proportion-based adaptive gradient optimization. This region-adaptive optimization ensures an appropriate trade-off between memory stability and learning plasticity. Additionally, to address the increase of redundant neurons from old tasks, we further introduce the Ebbinghaus forgetting mechanism to release them, thereby promoting the capability of the model to learn more generalized discriminative features. Experimental results show our method achieves a 21.3% improvement in EER over the state-of-the-art continual learning approach RWM for audio deepfake detection. Moreover, the effectiveness of RegO extends beyond the audio deepfake detection domain, showing potential significance in other tasks, such as image recognition. The code is available at https://github.com/cyjie429/RegO


Key findings
RegO achieved a 21.3% improvement in Equal Error Rate (EER) over the state-of-the-art continual learning method RWM for audio deepfake detection. The method effectively balances memory stability and learning plasticity, demonstrating strong potential in both cross-lingual and cross-task deepfake detection scenarios. Furthermore, RegO exhibited competitive results in a general study on image recognition, indicating its broader applicability beyond audio deepfake detection.
Approach
The proposed RegO method utilizes the Fisher information matrix to partition neuron regions into four categories based on their importance for real and fake audio detection. It then applies region-adaptive gradient optimization: fine-tuning less important regions, parallel gradient updates for real-audio-important regions, orthogonal updates for fake-audio-important regions, and sample proportion-based adaptive updates for regions important to both. An Ebbinghaus forgetting mechanism is also introduced to release redundant neurons, promoting adaptation and generalized feature learning.
Datasets
EVDA benchmark (FMFCC, In the Wild, ADD 2022, ASVspoof2015, ASVspoof2019, ASVspoof2021, FoR, HAD), CLEAR benchmark (for general study on image recognition).
Model(s)
Wav2vec 2.0 (feature extractor, with XLSR-53 pre-trained weights), 5-layer SimpleMlp (backend classifier).
Author countries
China