Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video

Authors: Matthew Groh, Aruna Sankaranarayanan, Nikhil Singh, Dong Young Kim, Andrew Lippman, Rosalind Picard

Published: 2022-02-25 18:47:32+00:00

AI Summary

This research investigates human accuracy in distinguishing real political speeches from deepfakes across text, audio, and video modalities. Five pre-registered experiments with 2215 participants reveal that audio-visual information significantly improves detection accuracy compared to text alone, with text-to-speech deepfakes proving harder to identify than those using voice actors.

Abstract

Recent advances in technology for hyper-realistic visual and audio effects provoke the concern that deepfake videos of political speeches will soon be indistinguishable from authentic video recordings. The conventional wisdom in communication theory predicts people will fall for fake news more often when the same version of a story is presented as a video versus text. We conduct 5 pre-registered randomized experiments with 2,215 participants to evaluate how accurately humans distinguish real political speeches from fabrications across base rates of misinformation, audio sources, question framings, and media modalities. We find base rates of misinformation minimally influence discernment and deepfakes with audio produced by the state-of-the-art text-to-speech algorithms are harder to discern than the same deepfakes with voice actor audio. Moreover across all experiments, we find audio and visual information enables more accurate discernment than text alone: human discernment relies more on how something is said, the audio-visual cues, than what is said, the speech content.


Key findings
Across all experiments, audio and video information significantly enhanced deepfake detection accuracy compared to text alone. Text-to-speech deepfakes were harder to detect than those with voice actor audio. Base rates of misinformation had a minimal effect on detection accuracy.
Approach
The study uses five pre-registered randomized experiments to test human ability to detect deepfakes. Participants were shown real and fake political speeches in various combinations of text, audio, and video modalities. Accuracy was measured across different conditions to assess the impact of modality, audio source, and base rates of misinformation.
Datasets
Presidential Deepfake Dataset (PDD), additional videos from Barari et al. 2021
Model(s)
UNKNOWN (The paper focuses on human detection, not the models used to *create* the deepfakes)
Author countries
USA