Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

Authors: Xu Li, Xixin Wu, Hui Lu, Xunying Liu, Helen Meng

Published: 2021-07-19 12:27:40+00:00

AI Summary

This paper proposes Channel-wise Gated Res2Net (CG-Res2Net), a novel architecture that improves the generalizability of synthetic speech detection systems to unseen attacks. It achieves this by incorporating a channel-wise gating mechanism into the Res2Net block, dynamically selecting relevant channels and suppressing less relevant ones.

Abstract

Existing approaches for anti-spoofing in automatic speaker verification (ASV) still lack generalizability to unseen attacks. The Res2Net approach designs a residual-like connection between feature groups within one block, which increases the possible receptive fields and improves the system's detection generalizability. However, such a residual-like connection is performed by a direct addition between feature groups without channel-wise priority. We argue that the information across channels may not contribute to spoofing cues equally, and the less relevant channels are expected to be suppressed before adding onto the next feature group, so that the system can generalize better to unseen attacks. This argument motivates the current work that presents a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism in the connection between feature groups. This gating mechanism dynamically selects channel-wise features based on the input, to suppress the less relevant channels and enhance the detection generalizability. Three gating mechanisms with different structures are proposed and integrated into Res2Net. Experimental results conducted on ASVspoof 2019 logical access (LA) demonstrate that the proposed CG-Res2Net significantly outperforms Res2Net on both the overall LA evaluation set and individual difficult unseen attacks, which also outperforms other state-of-the-art single systems, depicting the effectiveness of our method.


Key findings
The proposed CG-Res2Net models significantly outperform Res2Net and other state-of-the-art systems on the ASVspoof 2019 LA evaluation set, especially on difficult unseen attacks. MCG-Res2Net50 achieves the best overall performance, while MLCG-Res2Net50 shows superior performance on the most difficult unseen attack (A17).
Approach
The authors modify the Res2Net architecture by adding a channel-wise gating mechanism to the residual connections between feature groups. This gating mechanism dynamically selects important channels based on the input, improving generalization to unseen attacks. Three different gating mechanisms are proposed and compared.
Datasets
ASVspoof 2019 logical access (LA) partition
Model(s)
ResNet50, Res2Net50, SCG-Res2Net50, MCG-Res2Net50, MLCG-Res2Net50
Author countries
China, United Kingdom