Attentive Filtering Networks for Audio Replay Attack Detection

View on arXiv ← Back to list

Authors: Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King

Published: 2018-10-31 00:23:16+00:00

AI Summary

This paper proposes an Attentive Filtering Network (AFN) for audio replay attack detection. AFN uses an attention-based filtering mechanism to enhance feature representations in the time and frequency domains before classification with a ResNet. The system achieves a competitive equal error rate (EER) on the ASVspoof 2017 dataset.

Abstract

An attacker may use a variety of techniques to fool an automatic speaker verification system into accepting them as a genuine user. Anti-spoofing methods meanwhile aim to make the system robust against such attacks. The ASVspoof 2017 Challenge focused specifically on replay attacks, with the intention of measuring the limits of replay attack detection as well as developing countermeasures against them. In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier. We show that the network enables us to visualize the automatically acquired feature representations that are helpful for spoofing detection. Attentive Filtering Network attains an evaluation EER of 8.99$%$ on the ASVspoof 2017 Version 2.0 dataset. With system fusion, our best system further obtains a 30$%$ relative improvement over the ASVspoof 2017 enhanced baseline system.

Key findings

The AFN achieves an EER of 8.99% on the ASVspoof 2017 Version 2.0 dataset. System fusion further improves the performance, achieving an 8.54% EER and a 30% relative improvement over the baseline. Visualization of attention heatmaps shows the network effectively focuses on discriminative frequency components.

Approach

The proposed Attentive Filtering Network (AFN) employs an attention mechanism to filter and enhance features in both time and frequency domains. This enhanced representation is then fed into a ResNet-based classifier for replay attack detection. The attention mechanism is learned end-to-end.

Datasets

ASVspoof 2017 Version 2.0 dataset

Model(s)

Dilated Residual Network (DRN), U-Net (within Attentive Filtering)

Author countries

USA, UK, Portugal, Japan

← Previous