A study on the role of subsidiary information in replay attack spoofing detection

View on arXiv ← Back to list

Authors: Jee-weon Jung, Hye-jin Shim, Hee-Soo Heo, Ha-Jin Yu

Published: 2020-01-31 07:45:03+00:00

AI Summary

This study investigates the impact of subsidiary information (room size, reverberation, etc.) on replay attack detection in audio. Using adversarial and multi-task learning frameworks, the researchers analyze whether this information is implicitly present in deep neural network embeddings and if explicitly incorporating it improves detection accuracy.

Abstract

In this study, we analyze the role of various categories of subsidiary information in conducting replay attack spoofing detection: `Room Size', `Reverberation', `Speaker-to-ASV distance, `Attacker-to-Speaker distance', and `Replay Device Quality'. As a means of analyzing subsidiary information, we use two frameworks to either subtract or include a category of subsidiary information to the code extracted from a deep neural network. For subtraction, we utilize an adversarial process framework which makes the code orthogonal to the basis vectors of the subsidiary information. For addition, we utilize the multi-task learning framework to include subsidiary information to the code. All experiments are conducted using the ASVspoof 2019 physical access scenario with the provided meta data. Through the analysis of the result of the two approaches, we conclude that various categories of subsidiary information does not reside enough in the code when the deep neural network is trained for binary classification. Explicitly including various categories of subsidiary information through the multi-task learning framework can help improve performance in closed set condition.

Key findings

Subsidiary information is not sufficiently present in DNN embeddings trained for binary classification. Incorporating this information via MTL improves performance in closed-set conditions, but the improvement is less pronounced in open-set scenarios due to limited and ambiguous labels for the subsidiary information.

Approach

The authors employ two frameworks: cosine adversarial networks (CAN) to remove subsidiary information and multi-task learning (MTL) to incorporate it into deep neural network embeddings extracted from audio spectrograms. Performance is evaluated based on equal error rate (EER).

Datasets

ASVspoof 2019 physical access scenario dataset

Model(s)

End-to-end (E2E) deep neural network (DNN) with residual blocks and GRU layers. Cosine Adversarial Network (CAN) and Multi-Task Learning (MTL) frameworks are used in conjunction with the DNN.

Author countries

South Korea

← Previous