Generalized Fake Audio Detection via Deep Stable Learning

View on arXiv ← Back to list

Authors: Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Yi Lu, Xin Qi, Shuchen Shi

Published: 2024-06-05 13:16:31+00:00

AI Summary

This paper proposes a Sample Weight Learning (SWL) module for generalized fake audio detection. SWL addresses distribution shift by decorrelating features via learned sample weights, improving generalization across datasets without needing extra training data or complex training processes.

Abstract

Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions. Previous studies typically address distribution shift by focusing on using extra data or applying extra loss restrictions during training. However, these methods either require a substantial amount of data or complicate the training process. In this work, we propose a stable learning-based training scheme that involves a Sample Weight Learning (SWL) module, addressing distribution shift by decorrelating all selected features via learning weights from training samples. The proposed portable plug-in-like SWL is easy to apply to multiple base models and generalizes them without using extra data during training. Experiments conducted on the ASVspoof datasets clearly demonstrate the effectiveness of SWL in generalizing different models across three evaluation datasets from different distributions.

Key findings

SWL consistently improves the generalization of multiple base models across ASVspoof 2019 and 2021 datasets. Increasing the number of RFF mapping functions generally improves generalization. Optimally selecting features for decorrelation (e.g., focusing on spectral features in AASIST-L) further enhances performance.

Approach

The authors propose a stable learning-based training scheme using a Sample Weight Learning (SWL) module. SWL decorrelates features using Random Fourier Features (RFF) and an iterative optimization strategy to learn sample weights, improving model generalization across different audio datasets.

Datasets

ASVspoof 2019 (Logical Access subset), ASVspoof 2021 (LA and DF subsets)

Model(s)

AASIST, AASIST-L, RawNet2, TSSD

Author countries

China

← Previous