Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection

View on arXiv ← Back to list

Authors: Piotr Kawa, Marcin Plata, Piotr Syga

Published: 2022-06-27 12:30:44+00:00

AI Summary

This paper introduces the Attack Agnostic Dataset, combining audio deepfake and anti-spoofing datasets to improve the generalization and stability of audio deepfake detection methods. A LCNN model with LFCC and mel-spectrogram front-ends is proposed, showing improved generalization, stability, and performance compared to existing methods.

Abstract

Audio DeepFakes allow the creation of high-quality, convincing utterances and therefore pose a threat due to its potential applications such as impersonation or fake news. Methods for detecting these manipulations should be characterized by good generalization and stability leading to robustness against attacks conducted with techniques that are not explicitly included in the training. In this work, we introduce Attack Agnostic Dataset - a combination of two audio DeepFakes and one anti-spoofing datasets that, thanks to the disjoint use of attacks, can lead to better generalization of detection methods. We present a thorough analysis of current DeepFake detection methods and consider different audio features (front-ends). In addition, we propose a model based on LCNN with LFCC and mel-spectrogram front-end, which not only is characterized by a good generalization and stability results but also shows improvement over LFCC-based mode - we decrease standard deviation on all folds and EER in two folds by up to 5%.

Key findings

The proposed LCNN model with LFCC and mel-spectrogram front-ends outperforms other models in terms of generalization and stability on the Attack Agnostic Dataset. The use of LFCC features is shown to be more effective for deepfake detection than MFCC features. The stability of deepfake detection training is generally low, with significant fluctuations in test accuracy despite stable train accuracy.

Approach

The authors address the generalization and stability issues in audio deepfake detection by creating the Attack Agnostic Dataset, which combines diverse datasets with disjoint attacks. They then evaluate several deep learning models on this dataset, proposing a modified LCNN model that uses both LFCC and mel-spectrogram features for improved performance.

Datasets

FakeAVCeleb (audio subset), WaveFake, ASVspoof 2019 LA subset, LJSpeech, JSUT

Model(s)

LCNN, XceptionNet, MesoInception-4, RawNet2, GMM

Author countries

Poland

← Previous