Spoofing attack augmentation: can differently-trained attack models improve generalisation?

Authors: Wanying Ge, Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Nicholas Evans

Published: 2023-09-18 08:47:54+00:00

AI Summary

This paper investigates the variability of deepfake detection performance due to differences in spoofing attack model training. It demonstrates that even subtle changes in the training of spoofing attacks can significantly impact detection accuracy, and proposes spoofing attack augmentation as a complementary technique to improve generalization.

Abstract

A reliable deepfake detector or spoofing countermeasure (CM) should be robust in the face of unpredictable spoofing attacks. To encourage the learning of more generaliseable artefacts, rather than those specific only to known attacks, CMs are usually exposed to a broad variety of different attacks during training. Even so, the performance of deep-learning-based CM solutions are known to vary, sometimes substantially, when they are retrained with different initialisations, hyper-parameters or training data partitions. We show in this paper that the potency of spoofing attacks, also deep-learning-based, can similarly vary according to training conditions, sometimes resulting in substantial degradations to detection performance. Nevertheless, while a RawNet2 CM model is vulnerable when only modest adjustments are made to the attack algorithm, those based upon graph attention networks and self-supervised learning are reassuringly robust. The focus upon training data generated with different attack algorithms might not be sufficient on its own to ensure generaliability; some form of spoofing attack augmentation at the algorithm level can be complementary.


Key findings
Deepfake detection models showed varying vulnerabilities to differently-trained spoofing attacks. Models based on graph attention networks and self-supervised learning showed greater robustness. Training detectors with data from multiple, differently configured attack models improved generalization and robustness.
Approach
The authors trained several deepfake detection models (AASIST, RawNet2, SSL-AASIST) using audio data generated by a VITS text-to-speech model with varied training parameters. They evaluated detection performance under various matched and mismatched training/testing conditions, examining the impact of different attack model configurations on generalization.
Datasets
VCTK database (for training the VITS spoofing attack models and countermeasures), ASVspoof 2019 database (for countermeasure architecture and hyperparameter optimization)
Model(s)
AASIST (graph attention networks), RawNet2 (residual blocks and GRU), SSL-AASIST (wav2vec 2.0 and AASIST backend), VITS (variational autoencoder-based TTS model)
Author countries
France, Japan