ELEAT-SAGA: Early & Late Integration with Evading Alternating Training for Spoof-Robust Speaker Verification

Authors: Amro Asali, Yehuda Ben-Shimol, Itshak Lapidot

Published: 2026-02-14 13:16:25+00:00

AI Summary

This paper proposes ELEAT-SAGA, a novel spoofing-robust automatic speaker verification (SASV) architecture that introduces score-aware gated attention (SAGA) to dynamically modulate speaker embeddings using countermeasure (CM) scores. It integrates pre-trained ECAPA-TDNN for speaker embeddings and AASIST for CM scores, exploring early, late, and full integration strategies, alongside a new training procedure called evading alternating training (EAT). The system achieves significant performance improvements on ASVspoof 2019 LA and Spoofceleb datasets, demonstrating the effectiveness of score-aware attention and alternating training in enhancing SASV robustness.

Abstract

Spoofing-robust automatic speaker verification (SASV) seeks to build automatic speaker verification systems that are robust against both zero-effort impostor attacks and sophisticated spoofing techniques such as voice conversion (VC) and text-to-speech (TTS). In this work, we propose a novel SASV architecture that introduces score-aware gated attention (SAGA), SASV-SAGA, enabling dynamic modulation of speaker embeddings based on countermeasure (CM) scores. By integrating speaker embeddings and CM scores from pre-trained ECAPA-TDNN and AASIST models respectively, we explore several integration strategies including early, late, and full integration. We further introduce alternating training for multi-module (ATMM) and a refined variant, evading alternating training (EAT). Experimental results on the ASVspoof 2019 Logical Access (LA) and Spoofceleb datasets demonstrate significant improvements over baselines, achieving a spoofing aware speaker verification equal error rate (SASV-EER) of 1.22% and minimum normalized agnostic detection cost function (min a-DCF) of 0.0304 on the ASVspoof 2019 evaluation set. These results confirm the effectiveness of score-aware attention mechanisms and alternating training strategies in enhancing the robustness of SASV systems.

Key findings

The ELEAT-SAGA system significantly outperforms baselines, achieving an SASV-EER of 1.22% and min a-DCF of 0.0304 on the ASVspoof 2019 evaluation set. The findings confirm that score-aware attention mechanisms and alternating training strategies (EAT) effectively enhance SASV robustness and generalization, with full SAGA integration consistently outperforming other fusion approaches. Additionally, incorporating early CM features and bypassing CM training on noisy out-of-domain bona fide data improved generalization and training efficiency.

Approach

The proposed SASV-SAGA architecture integrates speaker embeddings from ECAPA-TDNN and CM scores from AASIST via score-aware gated attention (SAGA), which dynamically modulates speaker embeddings based on CM scores. It explores early, late, and full integration strategies and introduces Evading Alternating Training (EAT) for multi-module optimization, which strategically bypasses the SAGA operation when training with out-of-domain ASV data and incorporates early CM features.

Datasets

ASVspoof 2019 Logical Access (LA), Spoofceleb, VoxCeleb1, VoxCeleb2

Model(s)

ECAPA-TDNN, AASIST, SKA-TDNN (for SpoofCeleb experiments)

Author countries

Israel, France

← Previous