Detecting Deepfake Videos Using Euler Video Magnification

View on arXiv ← Back to list

Authors: Rashmiranjan Das, Gaurav Negi, Alan F. Smeaton

Published: 2021-01-27 17:37:23+00:00

AI Summary

This paper investigates the use of Euler Video Magnification (EVM) for deepfake video detection. EVM highlights subtle features like skin pulsation, and these extracted features are used to train three models (SSIM, LSTM, and heart rate estimation) to classify real and deepfake videos.

Abstract

Recent advances in artificial intelligence make it progressively hard to distinguish between genuine and counterfeit media, especially images and videos. One recent development is the rise of deepfake videos, based on manipulating videos using advanced machine learning techniques. This involves replacing the face of an individual from a source video with the face of a second person, in the destination video. This idea is becoming progressively refined as deepfakes are getting progressively seamless and simpler to compute. Combined with the outreach and speed of social media, deepfakes could easily fool individuals when depicting someone saying things that never happened and thus could persuade people in believing fictional scenarios, creating distress, and spreading fake news. In this paper, we examine a technique for possible identification of deepfake videos. We use Euler video magnification which applies spatial decomposition and temporal filtering on video data to highlight and magnify hidden features like skin pulsation and subtle motions. Our approach uses features extracted from the Euler technique to train three models to classify counterfeit and unaltered videos and compare the results with existing techniques.

Key findings

The results show that while pre-processing with EVM did not improve the accuracy of deepfake detection across all models, it did reveal differences in the videos that were detected. The heart rate estimation model showed that deepfakes accurately replicate skin pulsation, suggesting advanced GANs effectively model the true data distribution. The SSIM model showed better performance with the original videos than the EVM processed videos.

Approach

The authors utilize Euler Video Magnification to amplify subtle visual cues in videos, such as skin pulsation and tremors. Features extracted from the magnified videos are then fed into three different machine learning models for classification.

Datasets

Deepfake Detection Challenge Dataset (DFDC) and a self-created dataset of deepfake videos generated using Deepfacelab.

Model(s)

SSIM, LSTM, and a heart rate estimation model. Standard machine learning models (Logistic Regression, Decision Tree, Neural Net, Neural Net + Regression Tree) were also used for comparison.

Author countries

Ireland

← Previous