Exploiting Style Latent Flows for Generalizing Deepfake Video Detection

Authors: Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, Jongwon Choi

Published: 2024-03-11 10:35:58+00:00

AI Summary

This paper proposes a novel deepfake video detection framework that leverages the abnormal temporal changes in style latent vectors of generated videos. It uses a StyleGRU module, trained with contrastive learning, to capture these dynamics and integrates these features with content-based features via a style attention module for improved detection accuracy.

Abstract

This paper presents a new approach for the detection of fake videos, based on the analysis of style latent vectors and their abnormal behavior in temporal changes in the generated videos. We discovered that the generated facial videos suffer from the temporal distinctiveness in the temporal changes of style latent vectors, which are inevitable during the generation of temporally stable videos with various facial expressions and geometric transformations. Our framework utilizes the StyleGRU module, trained by contrastive learning, to represent the dynamic properties of style latent vectors. Additionally, we introduce a style attention module that integrates StyleGRU-generated features with content-based features, enabling the detection of visual and temporal artifacts. We demonstrate our approach across various benchmark scenarios in deepfake detection, showing its superiority in cross-dataset and cross-manipulation scenarios. Through further analysis, we also validate the importance of using temporal changes of style latent vectors to improve the generality of deepfake video detection.


Key findings
The proposed method outperforms existing state-of-the-art methods in cross-dataset and cross-manipulation scenarios. Ablation studies confirm the importance of both the StyleGRU module and the style attention module. Analysis validates the effectiveness of using temporal changes in style latent vectors for improved generalization in deepfake video detection.
Approach
The approach analyzes the temporal variations in style latent vectors extracted from video frames using a StyleGRU module trained with contrastive learning. These style-based temporal features are then integrated with content-based features using a style attention module before final classification.
Datasets
FaceForensics++, FaceShifter, DeeperForensics, CelebDF-v2, DeepfakeDetection
Model(s)
StyleGRU, 3D ResNet-50, Style Attention Module, Temporal Transformer Encoder
Author countries
Korea