Deepfake Detection by Human Crowds, Machines, and Machine-informed Crowds

Authors: Matthew Groh, Ziv Epstein, Chaz Firestone, Rosalind Picard

Published: 2021-05-13 18:22:16+00:00

AI Summary

This research compares human and machine performance in deepfake detection using two online studies with 15,016 participants. Human observers showed comparable accuracy to a leading computer vision model, but with different error patterns. Combining human and machine judgments improved accuracy, but inaccurate model predictions sometimes decreased human accuracy.

Abstract

The recent emergence of machine-manipulated media raises an important societal question: how can we know if a video that we watch is real or fake? In two online studies with 15,016 participants, we present authentic videos and deepfakes and ask participants to identify which is which. We compare the performance of ordinary human observers against the leading computer vision deepfake detection model and find them similarly accurate while making different kinds of mistakes. Together, participants with access to the model's prediction are more accurate than either alone, but inaccurate model predictions often decrease participants' accuracy. To probe the relative strengths and weaknesses of humans and machines as detectors of deepfakes, we examine human and machine performance across video-level features, and we evaluate the impact of pre-registered randomized interventions on deepfake detection. We find that manipulations designed to disrupt visual processing of faces hinder human participants' performance while mostly not affecting the model's performance, suggesting a role for specialized cognitive capacities in explaining human deepfake detection performance.


Key findings
Humans and the leading model exhibited similar overall accuracy, but made different types of errors. Combining human and machine judgments yielded the best results, though inaccurate model predictions sometimes negatively impacted human performance. Manipulations disrupting visual face processing hindered human but not machine accuracy, suggesting a role for specialized cognitive capacities in human deepfake detection.
Approach
The researchers conducted two online experiments where participants identified deepfakes from videos. Their performance was compared to a state-of-the-art deepfake detection model. The impact of the model's predictions on participant accuracy was also assessed.
Datasets
DeepFake Detection Challenge (DFDC) dataset, including training and holdout sets; additional videos of Kim Jong-un and Vladimir Putin.
Model(s)
The winning model from the DeepFake Detection Challenge (DFDC) competition, utilizing multitask cascaded convolutional neural networks for face detection, EfficientNet B-7 for feature encoding, and various data augmentations.
Author countries
USA