SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks

Authors: Kutub Uddin, Awais Khan, Muhammad Umar Farooq, Khalid Malik

Published: 2025-07-17 14:33:54+00:00

AI Summary

The paper proposes SHIELD, a collaborative learning method for robust audio deepfake detection against adversarial attacks. SHIELD integrates an auxiliary generative model to expose anti-forensic signatures and uses a triplet model to capture correlations between real and attacked audios, significantly improving robustness against generative adversarial attacks.

Abstract

Audio plays a crucial role in applications like speaker verification, voice-enabled smart devices, and audio conferencing. However, audio manipulations, such as deepfakes, pose significant risks by enabling the spread of misinformation. Our empirical analysis reveals that existing methods for detecting deepfake audio are often vulnerable to anti-forensic (AF) attacks, particularly those attacked using generative adversarial networks. In this article, we propose a novel collaborative learning method called SHIELD to defend against generative AF attacks. To expose AF signatures, we integrate an auxiliary generative model, called the defense (DF) generative model, which facilitates collaborative learning by combining input and output. Furthermore, we design a triplet model to capture correlations for real and AF attacked audios with real-generated and attacked-generated audios using auxiliary generative models. The proposed SHIELD strengthens the defense against generative AF attacks and achieves robust performance across various generative models. The proposed AF significantly reduces the average detection accuracy from 95.49% to 59.77% for ASVspoof2019, from 99.44% to 38.45% for In-the-Wild, and from 98.41% to 51.18% for HalfTruth for three different generative models. The proposed SHIELD mechanism is robust against AF attacks and achieves an average accuracy of 98.13%, 98.58%, and 99.57% in match, and 98.78%, 98.62%, and 98.85% in mismatch settings for the ASVspoof2019, In-the-Wild, and HalfTruth datasets, respectively.


Key findings
Existing audio deepfake detection methods are highly vulnerable to anti-forensic attacks, especially generative ones. SHIELD significantly improves robustness against these attacks, achieving high accuracy (above 98%) across multiple datasets and attack scenarios. SHIELD outperforms state-of-the-art defense mechanisms.
Approach
SHIELD employs collaborative learning by integrating an auxiliary generative model (defense model) to expose anti-forensic artifacts. A triplet model then captures correlations between real and attacked audios (including those generated by the defense model) to improve discrimination.
Datasets
ASVspoof2019, HalfTruth, In-the-Wild
Model(s)
RawNet3, RawNet2, RawBoost, Res-TSSDNet, Inc-TSSDNet, ResNet, MS-ResNet (as baseline detectors); UNet, SEGAN, OPGAN (as generative adversarial networks for attacks and defense)
Author countries
USA