SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection

View on arXiv ← Back to list

Authors: Piotr Kawa, Marcin Plata, Piotr Syga

Published: 2022-10-12 11:36:14+00:00

AI Summary

The paper introduces SpecRNet, a novel neural network architecture for audio deepfake detection designed for faster inference and low computational requirements. Benchmarks show SpecRNet achieves performance comparable to state-of-the-art models while requiring up to 40% less processing time.

Abstract

Audio DeepFakes are utterances generated with the use of deep neural networks. They are highly misleading and pose a threat due to use in fake news, impersonation, or extortion. In this work, we focus on increasing accessibility to the audio DeepFake detection methods by providing SpecRNet, a neural network architecture characterized by a quick inference time and low computational requirements. Our benchmark shows that SpecRNet, requiring up to about 40% less time to process an audio sample, provides performance comparable to LCNN architecture - one of the best audio DeepFake detection models. Such a method can not only be used by online multimedia services to verify a large bulk of content uploaded daily but also, thanks to its low requirements, by average citizens to evaluate materials on their devices. In addition, we provide benchmarks in three unique settings that confirm the correctness of our model. They reflect scenarios of low-resource datasets, detection on short utterances and limited attacks benchmark in which we take a closer look at the influence of particular attacks on given architectures.

Key findings

SpecRNet achieves comparable performance to LCNN, a state-of-the-art model, with up to 40% faster inference time. It demonstrates robustness in scenarios with limited data, short utterances, and limited attack types. The model's reduced computational requirements make it suitable for resource-constrained devices and large-scale applications.

Approach

SpecRNet utilizes a spectrogram-based approach (specifically LFCCs) inspired by RawNet2 but processes 2D spectrogram information. It incorporates residual blocks, FMS attention blocks, and bidirectional GRU layers, resulting in a model with significantly fewer parameters than comparable models.

Datasets

WaveFake dataset

Model(s)

SpecRNet, LCNN, RawNet2

Author countries

Poland

← Previous