Bridging the Spoof Gap: A Unified Parallel Aggregation Network for Voice Presentation Attacks
Authors: Awais Khan, Khalid Mahmood Malik
Published: 2023-09-19 12:12:59+00:00
AI Summary
This paper introduces a Parallel Stacked Aggregation Network to bridge the gap in unified spoofing detection for Automatic Speaker Verification (ASV) systems, which are vulnerable to both logical (LA) and physical (PA) attacks. The proposed approach directly processes raw audio using a split-transform-aggregation technique to identify spoofing attacks. It significantly outperforms state-of-the-art solutions on ASVspoof-2019 and VSDC datasets, showing reduced Equal Error Rate (EER) disparities and superior generalizability across attack types.
Abstract
Automatic Speaker Verification (ASV) systems are increasingly used in voice bio-metrics for user authentication but are susceptible to logical and physical spoofing attacks, posing security risks. Existing research mainly tackles logical or physical attacks separately, leading to a gap in unified spoofing detection. Moreover, when existing systems attempt to handle both types of attacks, they often exhibit significant disparities in the Equal Error Rate (EER). To bridge this gap, we present a Parallel Stacked Aggregation Network that processes raw audio. Our approach employs a split-transform-aggregation technique, dividing utterances into convolved representations, applying transformations, and aggregating the results to identify logical (LA) and physical (PA) spoofing attacks. Evaluation of the ASVspoof-2019 and VSDC datasets shows the effectiveness of the proposed system. It outperforms state-of-the-art solutions, displaying reduced EER disparities and superior performance in detecting spoofing attacks. This highlights the proposed method's generalizability and superiority. In a world increasingly reliant on voice-based security, our unified spoofing detection system provides a robust defense against a spectrum of voice spoofing attacks, safeguarding ASVs and user data effectively.