Study of detecting behavioral signatures within DeepFake videos

Authors: Qiaomu Miao, Sinhwa Kang, Stacy Marsella, Steve DiPaola, Chao Wang, Ari Shapiro

Published: 2022-08-06 18:30:53+00:00

AI Summary

This research investigates whether behavioral cues in DeepFake videos can be used for detection. By manipulating video content to transfer the behaviors of different speakers onto a target person (Donald Trump), the study examines whether viewers can distinguish real from synthetic videos based on behavioral signatures.

Abstract

There is strong interest in the generation of synthetic video imagery of people talking for various purposes, including entertainment, communication, training, and advertisement. With the development of deep fake generation models, synthetic video imagery will soon be visually indistinguishable to the naked eye from a naturally capture video. In addition, many methods are continuing to improve to avoid more careful, forensic visual analysis. Some deep fake videos are produced through the use of facial puppetry, which directly controls the head and face of the synthetic image through the movements of the actor, allow the actor to 'puppet' the image of another. In this paper, we address the question of whether one person's movements can be distinguished from the original speaker by controlling the visual appearance of the speaker but transferring the behavior signals from another source. We conduct a study by comparing synthetic imagery that: 1) originates from a different person speaking a different utterance, 2) originates from the same person speaking a different utterance, and 3) originates from a different person speaking the same utterance. Our study shows that synthetic videos in all three cases are seen as less real and less engaging than the original source video. Our results indicate that there could be a behavioral signature that is detectable from a person's movements that is separate from their visual appearance, and that this behavioral signature could be used to distinguish a deep fake from a properly captured video.


Key findings
Viewers consistently identified the original videos as more natural and engaging than the DeepFakes. This suggests that behavioral signatures, independent of visual quality, can be detected by humans and may serve as a valuable cue for DeepFake detection.
Approach
The researchers created DeepFake videos using facial puppetry techniques (Wav2Lip and FOMM) transferring the behaviors of different speakers onto Donald Trump's face. They conducted a user study comparing real and synthetic videos, assessing perceived naturalness and engagement.
Datasets
Videos of Donald Trump, Tom Cruise, Barack Obama, Taylor Swift, and Emma Watson were used to create the deepfakes. The specific dataset names are not explicitly provided.
Model(s)
Wav2Lip (for lip-syncing) and First Order Motion Model (FOMM) (for face reenactment).
Author countries
USA, Canada