How Do Deepfakes Move? Motion Magnification for Deepfake Source Detection

View on arXiv ← Back to list

Authors: Umur Aybars Ciftci, Ilke Demir

Published: 2022-12-28 18:59:21+00:00

AI Summary

This paper presents a novel deepfake source detection approach using motion magnification. By magnifying subtle facial movements, the method amplifies differences between real and fake videos, enabling identification of the generative model used to create the deepfake. The approach achieves high accuracy in identifying deepfake sources on multiple datasets.

Abstract

With the proliferation of deep generative models, deepfakes are improving in quality and quantity everyday. However, there are subtle authenticity signals in pristine videos, not replicated by SOTA GANs. We contrast the movement in deepfakes and authentic videos by motion magnification towards building a generalized deepfake source detector. The sub-muscular motion in faces has different interpretations per different generative models which is reflected in their generative residue. Our approach exploits the difference between real motion and the amplified GAN fingerprints, by combining deep and traditional motion magnification, to detect whether a video is fake and its source generator if so. Evaluating our approach on two multi-source datasets, we obtain 97.17% and 94.03% for video source detection. We compare against the prior deepfake source detector and other complex architectures. We also analyze the importance of magnification amount, phase extraction window, backbone network architecture, sample counts, and sample lengths. Finally, we report our results for different skin tones to assess the bias.

Key findings

The proposed method achieves 97.17% and 94.03% accuracy in video source detection on FaceForensics++ and FakeAVCeleb datasets, respectively. It outperforms existing deepfake source detectors and exhibits relatively high per-class accuracy, particularly for identifying fake videos. Ablation studies highlight the importance of dual motion magnification and the choice of network architecture.

Approach

The approach combines traditional and deep motion magnification techniques to amplify subtle motion artifacts in videos. These magnified motion patterns are then fed into a 3D CNN to classify the deepfake's source generator. Finally, per-sample predictions are aggregated using majority voting to obtain a video-level classification.

Datasets

FaceForensics++ and FakeAVCeleb

Model(s)

3D Convolutional Neural Network (CNN) similar to C3D, with comparisons against ResNet50, ResNet152, VGG19, Inception, DenseNet201, Xception.

Author countries

USA, USA

← Previous