Listening for Expert Identified Linguistic Features: Assessment of Audio Deepfake Discernment among Undergraduate Students

Authors: Noshaba N. Bhalli, Nehal Naqvi, Chloe Evered, Christine Mallinson, Vandana P. Janeja

Published: 2024-11-21 20:52:02+00:00

AI Summary

This study investigates whether training undergraduate students to identify expert-defined linguistic features in audio improves their ability to discern audio deepfakes. The researchers found that training significantly reduced students' uncertainty in evaluating audio clips and improved their ability to correctly identify clips they were initially unsure about.

Abstract

This paper evaluates the impact of training undergraduate students to improve their audio deepfake discernment ability by listening for expert-defined linguistic features. Such features have been shown to improve performance of AI algorithms; here, we ascertain whether this improvement in AI algorithms also translates to improvement of the perceptual awareness and discernment ability of listeners. With humans as the weakest link in any cybersecurity solution, we propose that listener discernment is a key factor for improving trustworthiness of audio content. In this study we determine whether training that familiarizes listeners with English language variation can improve their abilities to discern audio deepfakes. We focus on undergraduate students, as this demographic group is constantly exposed to social media and the potential for deception and misinformation online. To the best of our knowledge, our work is the first study to uniquely address English audio deepfake discernment through such techniques. Our research goes beyond informational training by introducing targeted linguistic cues to listeners as a deepfake discernment mechanism, via a training module. In a pre-/post- experimental design, we evaluated the impact of the training across 264 students as a representative cross section of all students at the University of Maryland, Baltimore County, and across experimental and control sections. Findings show that the experimental group showed a statistically significant decrease in their unsurety when evaluating audio clips and an improvement in their ability to correctly identify clips they were initially unsure about. While results are promising, future research will explore more robust and comprehensive trainings for greater impact.


Key findings
The experimental group showed a statistically significant decrease in uncertainty when evaluating audio clips. They also showed improvement in correctly identifying clips initially deemed unsure, though the improvement in overall deepfake detection accuracy was less pronounced. The control group also showed some improvement in accuracy, suggesting that general deepfake awareness is beneficial.
Approach
The researchers developed a training module focusing on expert-defined linguistic features (EDLFs) in audio, such as pitch, pauses, and consonant bursts. They conducted a pre-post experimental design with undergraduate students, comparing a trained group with a control group receiving general information on deepfakes.
Datasets
A hybrid dataset created from several commonly used machine learning datasets including ASVspoof 2017, FoR, LJ Speech, MelGAN, ASSEm-VC, and WaveNet.
Model(s)
UNKNOWN
Author countries
USA