Analyzing Reasoning Shifts in Audio Deepfake Detection under Adversarial Attacks: The Reasoning Tax versus Shield Bifurcation

Authors: Binh Nguyen, Thai Le

Published: 2026-01-07 05:46:45+00:00

Comment: Preprint for ACL 2026 submission

AI Summary

This paper introduces a forensic auditing framework to evaluate the robustness of Audio Language Models' (ALMs) reasoning in audio deepfake detection under adversarial attacks. It analyzes reasoning shifts across acoustic perception, cognitive coherence, and cognitive dissonance, revealing that explicit reasoning does not universally enhance robustness. Instead, reasoning can act as a defensive "shield" for acoustically robust models but imposes a "tax" on others, while high cognitive dissonance can serve as a "silent alarm" for potential manipulation.

Abstract

Audio Language Models (ALMs) offer a promising shift towards explainable audio deepfake detections (ADDs), moving beyond \\textit{black-box} classifiers by providing some level of transparency into their predictions via reasoning traces. This necessitates a new class of model robustness analysis: robustness of the predictive reasoning under adversarial attacks, which goes beyond existing paradigm that mainly focuses on the shifts of the final predictions (e.g., fake v.s. real). To analyze such reasoning shifts, we introduce a forensic auditing framework to evaluate the robustness of ALMs' reasoning under adversarial attacks in three inter-connected dimensions: acoustic perception, cognitive coherence, and cognitive dissonance. Our systematic analysis reveals that explicit reasoning does not universally enhance robustness. Instead, we observe a bifurcation: for models exhibiting robust acoustic perception, reasoning acts as a defensive \\textit{``shield''}, protecting them from adversarial attacks. However, for others, it imposes a performance \\textit{``tax''}, particularly under linguistic attacks which reduce cognitive coherence and increase attack success rate. Crucially, even when classification fails, high cognitive dissonance can serve as a \\textit{silent alarm}, flagging potential manipulation. Overall, this work provides a critical evaluation of the role of reasoning in forensic audio deepfake analysis and its vulnerabilities.


Key findings
Explicit reasoning does not universally enhance robustness in ALMs; it acts as a "shield" for acoustically grounded models but imposes a "tax" on others, degrading performance under adversarial attacks. Cognitive dissonance serves as a valuable "silent alarm" under acoustic attacks, signaling internal conflict even when the final classification fails. Linguistic attacks are particularly insidious, as they induce "hallucinated consistency" where models confidently rationalize errors with high coherence and low dissonance, effectively masking the manipulation.
Approach
The authors propose a three-tier forensic auditing framework to analyze the robustness of ALMs' reasoning under adversarial attacks. This framework evaluates acoustic perception (model's accuracy in describing audio features), cognitive coherence (logical consistency between reasoning and conclusion), and cognitive dissonance (conflict between reasoning and a wrong verdict). The framework is applied to understand how reasoning shifts under linguistic and acoustic adversarial attacks.
Datasets
ASVSpoof 2019 logical access dataset
Model(s)
AASIST-2, RawNet-2, CLAD (traditional ADDs); Qwen2-Audio-7B, Phi-4-multimodal, gemma-3n-E4B, granite-3.3-8b (Audio Language Models)
Author countries
United States