Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection

View on arXiv ← Back to list

Authors: Amir Mohammad Rostami, Mohammad Mehdi Homayounpour, Ahmad Nickabadi

Published: 2021-09-05 12:10:16+00:00

AI Summary

This paper proposes the Efficient Attention Branch Network (EABN) for automatic speaker verification spoof detection, addressing the generalization problem of existing models. EABN uses an attention branch to generate interpretable attention masks that improve classification performance in a perception branch, employing the efficient EfficientNet-A0 architecture.

Abstract

Many endeavors have sought to develop countermeasure techniques as enhancements on Automatic Speaker Verification (ASV) systems, in order to make them more robust against spoof attacks. As evidenced by the latest ASVspoof 2019 countermeasure challenge, models currently deployed for the task of ASV are, at their best, devoid of suitable degrees of generalization to unseen attacks. Upon further investigation of the proposed methods, it appears that a broader three-tiered view of the proposed systems. comprised of the classifier, feature extraction phase, and model loss function, may to some extent lessen the problem. Accordingly, the present study proposes the Efficient Attention Branch Network (EABN) modular architecture with a combined loss function to address the generalization problem...

Key findings

The EABN achieved EER = 0.86% and t-DCF = 0.0239 in the Physical Access scenario using log-PowSpec features and EfficientNet-A0. In the Logical Access scenario using LFCC features and SE-Res2Net50, it achieved EER = 1.89% and t-DCF = 0.507, outperforming other single systems. The attention masks provided interpretable insights into the model's decision-making process.

Approach

The EABN architecture consists of two branches: an attention branch producing interpretable attention masks and a perception branch (using EfficientNet-A0) for spoof detection. A combined loss function, including triplet center loss, is used for training to improve discriminative ability and generalization.

Datasets

ASVspoof 2019 dataset (Physical Access and Logical Access scenarios)

Model(s)

Efficient Attention Branch Network (EABN) with EfficientNet-A0 and SE-Res2Net50 as options for the perception branch.

Author countries

Iran

← Previous