HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection

Authors: Qing Wen, Haohao Li, Zhongjie Ba, Peng Cheng, Miao He, Li Lu, Kui Ren

Published: 2026-02-05 13:53:14+00:00

Comment: 20 pages, 8 figures

AI Summary

This paper introduces HyperPotter, a hypergraph-based framework for audio deepfake detection that explicitly models high-order interactions (HOIs) using clustering-based hyperedges and class-aware prototype initialization. By capturing synergistic patterns beyond pairwise relations, HyperPotter significantly improves detection generalization across diverse spoofing attacks and speaker conditions. Experiments show it outperforms its baseline by 22.15% across 11 datasets and state-of-the-art methods by 13.96% on 4 challenging cross-domain datasets.

Abstract

Advances in AIGC technologies have enabled the synthesis of highly realistic audio deepfakes capable of deceiving human auditory perception. Although numerous audio deepfake detection (ADD) methods have been developed, most rely on local temporal/spectral features or pairwise relations, overlooking high-order interactions (HOIs). HOIs capture discriminative patterns that emerge from multiple feature components beyond their individual contributions. We propose HyperPotter, a hypergraph-based framework that explicitly models these synergistic HOIs through clustering-based hyperedges with class-aware prototype initialization. Extensive experiments demonstrate that HyperPotter surpasses its baseline by an average relative gain of 22.15% across 11 datasets and outperforms state-of-the-art methods by 13.96% on 4 challenging cross-domain datasets, demonstrating superior generalization to diverse attacks and speakers.

Key findings

HyperPotter significantly improves generalization, outperforming its baseline by an average relative gain of 22.15% across 11 datasets and surpassing state-of-the-art methods by 13.96% on 4 challenging cross-domain datasets. It achieves the best EER on critical benchmarks such as In-the-Wild, ASVspoof2021 DF, and FoR, demonstrating superior robustness to diverse attacks and speakers. Ablation studies confirm the necessity of its hypergraph modeling, relational artifact amplification, and prototype banks, though severe channel interference can obscure high-order interdependencies.

Approach

HyperPotter formulates audio deepfake detection as a graph-level classification problem using a memory-enhanced hypergraph attention network (HAGNN). It models high-order interactions through Fuzzy C-Means (FCM) based hyperedges, initialized with a class-aware prototype bank, and amplifies relational artifacts via an attention-driven mechanism. This prototype-guided hyperedge construction allows for long-term memorization and efficient relational modeling of complex synthetic artifacts.

Datasets

ASVspoof2019 LA (training), In-the-Wild, ASVspoof2019 LA, ASVspoof2021 LA, ASVspoof2021 DF, ASVspoof2024, FoR, Codecfake, ADD2022 Track 1, ADD2022 Track 3, ADD2023 Track1.2 Round 1, ADD2023 Track1.2 Round 2, LibriVoc, SONAR, PartialSpoof.

Model(s)

HyperPotter (Hypergraph-based framework), Wav2Vec2-AASIST (backbone), XLS-R (pretrained SSL front-end), RawNet2 (encoder), Hypergraph Attention Layer (HAGNN), Fuzzy C-Means (FCM) clustering, Prototype Bank.

Author countries

China

← Previous