Bridging the Spoof Gap: A Unified Parallel Aggregation Network for Voice Presentation Attacks
Authors: Awais Khan, Khalid Mahmood Malik
Published: 2023-09-19 12:12:59+00:00
AI Summary
This paper proposes a Parallel Stacked Aggregation Network (PSA) for unified voice spoofing detection, addressing the gap in existing research that tackles logical and physical attacks separately. The PSA network processes raw audio using a split-transform-aggregation technique to identify both logical and physical attacks, outperforming state-of-the-art solutions with reduced Equal Error Rate (EER) disparities.
Abstract
Automatic Speaker Verification (ASV) systems are increasingly used in voice bio-metrics for user authentication but are susceptible to logical and physical spoofing attacks, posing security risks. Existing research mainly tackles logical or physical attacks separately, leading to a gap in unified spoofing detection. Moreover, when existing systems attempt to handle both types of attacks, they often exhibit significant disparities in the Equal Error Rate (EER). To bridge this gap, we present a Parallel Stacked Aggregation Network that processes raw audio. Our approach employs a split-transform-aggregation technique, dividing utterances into convolved representations, applying transformations, and aggregating the results to identify logical (LA) and physical (PA) spoofing attacks. Evaluation of the ASVspoof-2019 and VSDC datasets shows the effectiveness of the proposed system. It outperforms state-of-the-art solutions, displaying reduced EER disparities and superior performance in detecting spoofing attacks. This highlights the proposed method's generalizability and superiority. In a world increasingly reliant on voice-based security, our unified spoofing detection system provides a robust defense against a spectrum of voice spoofing attacks, safeguarding ASVs and user data effectively.