Efficient Temporally-Aware DeepFake Detection using H.264 Motion Vectors

Authors: Peter Grönquist, Yufan Ren, Qingyi He, Alessio Verardo, Sabine Süsstrunk

Published: 2023-11-17 00:21:02+00:00

AI Summary

This paper proposes a novel DeepFake detection method using H.264 motion vectors and information masks, which are readily available in compressed video streams. This approach is computationally efficient and outperforms optical flow-based methods while achieving comparable generalization capabilities.

Abstract

Video DeepFakes are fake media created with Deep Learning (DL) that manipulate a person's expression or identity. Most current DeepFake detection methods analyze each frame independently, ignoring inconsistencies and unnatural movements between frames. Some newer methods employ optical flow models to capture this temporal aspect, but they are computationally expensive. In contrast, we propose using the related but often ignored Motion Vectors (MVs) and Information Masks (IMs) from the H.264 video codec, to detect temporal inconsistencies in DeepFakes. Our experiments show that this approach is effective and has minimal computational costs, compared with per-frame RGB-only methods. This could lead to new, real-time temporally-aware DeepFake detection methods for video calls and streaming.


Key findings
The proposed method using MVs and IMs achieves higher accuracy and better generalization than RGB-only methods. It significantly reduces computational costs compared to optical flow-based approaches, demonstrating the potential for real-time DeepFake detection. The MV-based approach shows comparable generalization capabilities to optical flow-based methods.
Approach
The method leverages motion vectors (MVs) and information masks (IMs) extracted from H.264 video codecs to detect temporal inconsistencies in DeepFakes. A MobileNetV3-based classifier is trained on these features, either independently or in conjunction with RGB frames, to classify videos as real or fake.
Datasets
FaceForensics++ (hq version, C23)
Model(s)
MobileNetV3 (with modifications for multi-modal input and binary classification)
Author countries
Switzerland