Toward Transdisciplinary Approaches to Audio Deepfake Discernment

Authors: Vandana P. Janeja, Christine Mallinson

Published: 2024-11-08 20:59:25+00:00

AI Summary

This paper advocates for a transdisciplinary approach to audio deepfake detection, integrating linguistic knowledge with AI methods to overcome limitations of current expert-agnostic AI models. It highlights the need to move beyond a solely AI-based approach and incorporate human expertise in language to improve the robustness and comprehensiveness of deepfake detection.

Abstract

This perspective calls for scholars across disciplines to address the challenge of audio deepfake detection and discernment through an interdisciplinary lens across Artificial Intelligence methods and linguistics. With an avalanche of tools for the generation of realistic-sounding fake speech on one side, the detection of deepfakes is lagging on the other. Particularly hindering audio deepfake detection is the fact that current AI models lack a full understanding of the inherent variability of language and the complexities and uniqueness of human speech. We see the promising potential in recent transdisciplinary work that incorporates linguistic knowledge into AI approaches to provide pathways for expert-in-the-loop and to move beyond expert agnostic AI-based methods for more robust and comprehensive deepfake detection.


Key findings
Augmenting audio data with expert-informed linguistic annotations significantly increased the accuracy of spoofed audio detection. Pilot studies suggest that training humans to listen for specific linguistic characteristics improves their ability to discern real from fake speech. Current AI models are limited by their lack of understanding of human language variability.
Approach
The authors propose incorporating linguistic knowledge and human expertise into AI-based audio deepfake detection. This involves augmenting training data with linguistically informed annotations and developing human discernment training to identify subtle cues indicative of fake audio.
Datasets
UNKNOWN
Model(s)
The paper mentions using single and ensemble models in their research but does not specify the architectures.
Author countries
U.S.A.