Can DeepFake Speech be Reliably Detected?

Authors: Hongbin Liu, Youzheng Chen, Arun Narayanan, Athula Balachandran, Pedro J. Moreno, Lun Wang

Published: 2024-10-09 06:13:48+00:00

AI Summary

This research systematically studies malicious attacks against state-of-the-art open-source synthetic speech detectors (SSDs). It evaluates white-box, black-box, and agnostic attacks, measuring effectiveness and stealthiness using metrics and human ratings, revealing significant vulnerabilities in current SSDs.

Abstract

Recent advances in text-to-speech (TTS) systems, particularly those with voice cloning capabilities, have made voice impersonation readily accessible, raising ethical and legal concerns due to potential misuse for malicious activities like misinformation campaigns and fraud. While synthetic speech detectors (SSDs) exist to combat this, they are vulnerable to ``test domain shift, exhibiting decreased performance when audio is altered through transcoding, playback, or background noise. This vulnerability is further exacerbated by deliberate manipulation of synthetic speech aimed at deceiving detectors. This work presents the first systematic study of such active malicious attacks against state-of-the-art open-source SSDs. White-box attacks, black-box attacks, and their transferability are studied from both attack effectiveness and stealthiness, using both hardcoded metrics and human ratings. The results highlight the urgent need for more robust detection methods in the face of evolving adversarial threats.


Key findings
State-of-the-art open-source SSDs are highly vulnerable to adversarial attacks, particularly those with model access. Even agnostic attacks showed significant success rates. Attacks can be highly effective while maintaining reasonable audio quality, as judged by both objective metrics and human listeners.
Approach
The researchers used white-box (PGD and I-FGSM), black-box (SimBA), and agnostic attacks to test the robustness of four open-source SSDs. Attack effectiveness and stealthiness were measured using attack success rates, VisQOL scores, and human ratings of audio quality.
Datasets
ASVSpoof2019-LA, WaveFake, and In-the-wild datasets.
Model(s)
AASIST, AASIST-L, RawNet2, and RawGATST.
Author countries
USA, United States