Augmentation through Laundering Attacks for Audio Spoof Detection

Authors: Hashim Ali, Surya Subramani, Hafiz Malik

Published: 2024-10-01 22:34:51+00:00

AI Summary

This paper presents a submission to the ASVspoof 5 Challenge, investigating the performance of an Audio Spoof Detection (ASD) system. It focuses on training the AASIST model using data augmentation generated through various 'laundering attacks' to enhance robustness against diverse acoustic conditions, spoofing attacks, and codec conditions. The study evaluates the system's performance on the ASVspoof 5 database.

Abstract

Recent text-to-speech (TTS) developments have made voice cloning (VC) more realistic, affordable, and easily accessible. This has given rise to many potential abuses of this technology, including Joe Biden's New Hampshire deepfake robocall. Several methodologies have been proposed to detect such clones. However, these methodologies have been trained and evaluated on relatively clean databases. Recently, ASVspoof 5 Challenge introduced a new crowd-sourced database of diverse acoustic conditions including various spoofing attacks and codec conditions. This paper is our submission to the ASVspoof 5 Challenge and aims to investigate the performance of Audio Spoof Detection, trained using data augmentation through laundering attacks, on the ASVSpoof 5 database. The results demonstrate that our system performs worst on A18, A19, A20, A26, and A30 spoofing attacks and in the codec and compression conditions of C08, C09, and C10.


Key findings
The system demonstrated the worst performance on specific adversarial and complex TTS spoofing attacks (A18, A19, A20, A26, A30), and under codec conditions involving an 8 kHz sampling rate (C08, C09, C10) or Encodec (C04, C07). It achieved the best performance on the A29 TTS attack and under no codec (C00) or MP3 codec (C05) conditions, with a pooled minDCF of 0.662 and EER of 25.319%.
Approach
The approach involves augmenting the ASVSpoof 5 training data by applying various 'laundering attacks' such as reverberation, additive noise, recompression, resampling, and low-pass filtering to 10% of the original dataset. The AASIST model is then trained on this combined augmented and original training data, and evaluated on the ASVspoof 5 eval database.
Datasets
ASVspoof 5 Challenge database (train, development, and eval splits)
Model(s)
AASIST (Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks)
Author countries
USA