Graph Attention Networks for Anti-Spoofing

Authors: Hemlata Tak, Jee-weon Jung, Jose Patino, Massimiliano Todisco, Nicholas Evans

Published: 2021-04-08 10:18:17+00:00

AI Summary

This paper proposes using Graph Attention Networks (GATs) to improve spoofing detection in automatic speaker verification by modeling the relationships between spectral sub-bands or temporal segments. Experiments on the ASVspoof 2019 database show that the GAT-based model with temporal attention outperforms baseline systems, and fusion with other systems provides significant performance improvements.

Abstract

The cues needed to detect spoofing attacks against automatic speaker verification are often located in specific spectral sub-bands or temporal segments. Previous works show the potential to learn these using either spectral or temporal self-attention mechanisms but not the relationships between neighbouring sub-bands or segments. This paper reports our use of graph attention networks (GATs) to model these relationships and to improve spoofing detection performance. GATs leverage a self-attention mechanism over graph structured data to model the data manifold and the relationships between nodes. Our graph is constructed from representations produced by a ResNet. Nodes in the graph represent information either in specific sub-bands or temporal segments. Experiments performed on the ASVspoof 2019 logical access database show that our GAT-based model with temporal attention outperforms all of our baseline single systems. Furthermore, GAT-based systems are complementary to a set of existing systems. The fusion of GAT-based models with more conventional countermeasures delivers a 47% relative improvement in performance compared to the best performing single GAT system.


Key findings
The GAT-based model with temporal attention outperforms baseline systems. Fusion of GAT-based models with other systems yields a 47% relative improvement in performance compared to the best single GAT system. Different attacks exhibit different artifacts, highlighting the complementarity of spectral and temporal attention mechanisms.
Approach
The authors employ a ResNet-18 to extract high-level representations from audio features. These representations are then structured as a graph, where nodes represent sub-bands or temporal segments. A GAT layer processes this graph, learning relationships between nodes to improve spoofing detection.
Datasets
ASVspoof 2019 logical access database
Model(s)
ResNet-18, Graph Attention Networks (GATs)
Author countries
France, South Korea