Augmentation through Laundering Attacks for Audio Spoof Detection

View on arXiv ← Back to list

Authors: Hashim Ali, Surya Subramani, Hafiz Malik

Published: 2024-10-01 22:34:51+00:00

AI Summary

This paper investigates the performance of an audio spoof detection system (AASIST) trained using data augmentation through laundering attacks on the ASVspoof 5 database. The results show that the system performs worst on specific spoofing attacks and codec conditions, highlighting challenges in real-world audio deepfake detection.

Abstract

Recent text-to-speech (TTS) developments have made voice cloning (VC) more realistic, affordable, and easily accessible. This has given rise to many potential abuses of this technology, including Joe Biden's New Hampshire deepfake robocall. Several methodologies have been proposed to detect such clones. However, these methodologies have been trained and evaluated on relatively clean databases. Recently, ASVspoof 5 Challenge introduced a new crowd-sourced database of diverse acoustic conditions including various spoofing attacks and codec conditions. This paper is our submission to the ASVspoof 5 Challenge and aims to investigate the performance of Audio Spoof Detection, trained using data augmentation through laundering attacks, on the ASVSpoof 5 database. The results demonstrate that our system performs worst on A18, A19, A20, A26, and A30 spoofing attacks and in the codec and compression conditions of C08, C09, and C10.

Key findings

The AASIST system, trained on the augmented dataset, achieved a pooled minDCF of 0.662 and an EER of 25.319%. Performance was significantly degraded by specific attacks (A18, A19, A20, A26, A30) and codec conditions (C08, C09, C10), particularly those involving adversarial attacks and 8 kHz sampling rates.

Approach

The authors augment the ASVspoof 5 training dataset by applying various laundering attacks (noise addition, reverberation, recompression, resampling, filtering) to a subset of the data. They then train the AASIST model on this augmented dataset and evaluate its performance on the ASVspoof 5 evaluation set.

Datasets

ASVspoof 5 database (train and evaluation sets)

Model(s)

AASIST (RawNet2-based encoder with heterogeneous stacking graph attention layers, max graph operation, and readout operation)

Author countries

USA

← Previous