Replay attack detection with complementary high-resolution information using end-to-end DNN for the ASVspoof 2019 Challenge
Authors: Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu
Published: 2019-04-23 03:29:36+00:00
AI Summary
This paper proposes an end-to-end deep neural network (DNN) for replay attack detection in speaker verification, using high-resolution spectrograms with complementary information (magnitude, phase, and power spectral density). The approach avoids handcrafted features, focusing instead on directly modeling raw audio information for improved robustness against advanced spoofing techniques.
Abstract
In this study, we concentrate on replacing the process of extracting hand-crafted acoustic feature with end-to-end DNN using complementary high-resolution spectrograms. As a result of advance in audio devices, typical characteristics of a replayed speech based on conventional knowledge alter or diminish in unknown replay configurations. Thus, it has become increasingly difficult to detect spoofed speech with a conventional knowledge-based approach. To detect unrevealed characteristics that reside in a replayed speech, we directly input spectrograms into an end-to-end DNN without knowledge-based intervention. Explorations dealt in this study that differentiates from existing spectrogram-based systems are twofold: complementary information and high-resolution. Spectrograms with different information are explored, and it is shown that additional information such as the phase information can be complementary. High-resolution spectrograms are employed with the assumption that the difference between a bona-fide and a replayed speech exists in the details. Additionally, to verify whether other features are complementary to spectrograms, we also examine raw waveform and an i-vector based system. Experiments conducted on the ASVspoof 2019 physical access challenge show promising results, where t-DCF and equal error rates are 0.0570 and 2.45 % for the evaluation set, respectively.