DeepRhythm: Exposing DeepFakes with Attentional Visual Heartbeat Rhythms

View on arXiv ← Back to list

Authors: Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Wei Feng, Yang Liu, Jianjun Zhao

Published: 2020-06-13 12:56:46+00:00

AI Summary

DeepRhythm detects deepfakes by analyzing visual heartbeat rhythms in videos. It uses dual spatial-temporal attention to adapt to varying face and fake types, demonstrating effectiveness and generalization across datasets.

Abstract

As the GAN-based face image and video generation techniques, widely known as DeepFakes, have become more and more matured and realistic, there comes a pressing and urgent demand for effective DeepFakes detectors. Motivated by the fact that remote visual photoplethysmography (PPG) is made possible by monitoring the minuscule periodic changes of skin color due to blood pumping through the face, we conjecture that normal heartbeat rhythms found in the real face videos will be disrupted or even entirely broken in a DeepFake video, making it a potentially powerful indicator for DeepFake detection. In this work, we propose DeepRhythm, a DeepFake detection technique that exposes DeepFakes by monitoring the heartbeat rhythms. DeepRhythm utilizes dual-spatial-temporal attention to adapt to dynamically changing face and fake types. Extensive experiments on FaceForensics++ and DFDC-preview datasets have confirmed our conjecture and demonstrated not only the effectiveness, but also the generalization capability of emph{DeepRhythm} over different datasets by various DeepFakes generation techniques and multifarious challenging degradations.

Key findings

DeepRhythm outperforms state-of-the-art deepfake detection methods on FaceForensics++ and demonstrates generalization to the DFDC-preview dataset. The approach shows robustness to various degradations like JPEG compression and noise, although performance is affected by temporal sampling.

Approach

DeepRhythm leverages the subtle periodic skin color changes caused by heartbeats (remote PPG). It uses a motion-magnified spatial-temporal representation to highlight these rhythms and a dual-spatial-temporal attention network to improve detection accuracy and robustness.

Datasets

FaceForensics++, DFDC-preview

Model(s)

ResNet18, LSTM, MesoNet (as a component)

Author countries

Japan, Singapore, USA, China

← Previous