Generalizable Deepfake Detection with Phase-Based Motion Analysis
Authors: Ekta Prashnani, Michael Goebel, B. S. Manjunath
Published: 2022-11-17 06:28:01+00:00
AI Summary
PhaseForensics, a deepfake video detection method, uses a phase-based motion representation of facial temporal dynamics to improve cross-dataset generalization and robustness to distortions and adversarial attacks. It leverages temporal phase variations in face sub-regions, providing a robust motion estimate less susceptible to cross-dataset variations and adversarial perturbations.
Abstract
We propose PhaseForensics, a DeepFake (DF) video detection method that leverages a phase-based motion representation of facial temporal dynamics. Existing methods relying on temporal inconsistencies for DF detection present many advantages over the typical frame-based methods. However, they still show limited cross-dataset generalization and robustness to common distortions. These shortcomings are partially due to error-prone motion estimation and landmark tracking, or the susceptibility of the pixel intensity-based features to spatial distortions and the cross-dataset domain shifts. Our key insight to overcome these issues is to leverage the temporal phase variations in the band-pass components of the Complex Steerable Pyramid on face sub-regions. This not only enables a robust estimate of the temporal dynamics in these regions, but is also less prone to cross-dataset variations. Furthermore, the band-pass filters used to compute the local per-frame phase form an effective defense against the perturbations commonly seen in gradient-based adversarial attacks. Overall, with PhaseForensics, we show improved distortion and adversarial robustness, and state-of-the-art cross-dataset generalization, with 91.2% video-level AUC on the challenging CelebDFv2 (a recent state-of-the-art compares at 86.9%).