Generalizable Audio Deepfake Detection via Hierarchical Structure Learning and Feature Whitening in Poincaré sphere
Authors: Mingru Yang, Yanmei Gu, Qianhua He, Yanxiong Li, Peirong Zhang, Yongqiang Chen, Zhiming Wang, Huijia Zhu, Jian Liu, Weiqiang Wang
Published: 2025-08-03 19:16:29+00:00
AI Summary
This paper introduces Poin-HierNet, a novel framework for generalizable audio deepfake detection. Poin-HierNet leverages the Poincaré sphere to construct domain-invariant hierarchical representations, outperforming state-of-the-art methods in Equal Error Rate.
Abstract
Audio deepfake detection (ADD) faces critical generalization challenges due to diverse real-world spoofing attacks and domain variations. However, existing methods primarily rely on Euclidean distances, failing to adequately capture the intrinsic hierarchical structures associated with attack categories and domain factors. To address these issues, we design a novel framework Poin-HierNet to construct domain-invariant hierarchical representations in the Poincar'e sphere. Poin-HierNet includes three key components: 1) Poincar'e Prototype Learning (PPL) with several data prototypes aligning sample features and capturing multilevel hierarchies beyond human labels; 2) Hierarchical Structure Learning (HSL) leverages top prototypes to establish a tree-like hierarchical structure from data prototypes; and 3) Poincar'e Feature Whitening (PFW) enhances domain invariance by applying feature whitening to suppress domain-sensitive features. We evaluate our approach on four datasets: ASVspoof 2019 LA, ASVspoof 2021 LA, ASVspoof 2021 DF, and In-The-Wild. Experimental results demonstrate that Poin-HierNet exceeds state-of-the-art methods in Equal Error Rate.