Defense Against Adversarial Attacks on Audio DeepFake Detection

Authors: Piotr Kawa, Marcin Plata, Piotr Syga

Published: 2022-12-30 08:41:06+00:00

AI Summary

This research evaluates the robustness of three audio deepfake detection architectures against adversarial attacks. The authors introduce a novel adaptive adversarial training method to enhance the robustness of these detectors, notably adapting RawNet3 for deepfake detection for the first time.

Abstract

Audio DeepFakes (DF) are artificially generated utterances created using deep learning, with the primary aim of fooling the listeners in a highly convincing manner. Their quality is sufficient to pose a severe threat in terms of security and privacy, including the reliability of news or defamation. Multiple neural network-based methods to detect generated speech have been proposed to prevent the threats. In this work, we cover the topic of adversarial attacks, which decrease the performance of detectors by adding superficial (difficult to spot by a human) changes to input data. Our contribution contains evaluating the robustness of 3 detection architectures against adversarial attacks in two scenarios (white-box and using transferability) and enhancing it later by using adversarial training performed by our novel adaptive training. Moreover, one of the investigated architectures is RawNet3, which, to the best of our knowledge, we adapted for the first time to DeepFake detection.


Key findings
Adversarial attacks significantly reduce the effectiveness of audio deepfake detectors. The proposed adaptive adversarial training method substantially improves the robustness of the LCNN model against both white-box and transferability attacks, with acceptable performance on clean data. RawNet3 shows inherent robustness.
Approach
The study evaluates the robustness of three audio deepfake detection models (LCNN, RawNet3, SpecRNet) against various adversarial attacks (FGSM, PGD, FAB) in white-box and transferability scenarios. It then employs a novel adaptive adversarial training method to improve the models' resistance to these attacks.
Datasets
ASVspoof2021 (DF subset), FakeAVCeleb, and WaveFake datasets, comprising a total of 41,217 real and 702,269 fake samples generated using over 100 deepfake methods.
Model(s)
LCNN, RawNet3, SpecRNet
Author countries
Poland