Shaking the Fake: Detecting Deepfake Videos in Real Time via Active Probes

Authors: Zhixin Xie, Jun Luo

Published: 2024-09-17 04:58:30+00:00

AI Summary

SFake is a real-time deepfake detection method that actively introduces controllable blur features into video footage by inducing vibrations on the smartphone. It then determines deepfake presence based on the consistency of the facial area with the induced probe pattern, outperforming passive methods in accuracy, speed, and memory usage.

Abstract

Real-time deepfake, a type of generative AI, is capable of creating non-existing contents (e.g., swapping one's face with another) in a video. It has been, very unfortunately, misused to produce deepfake videos (during web conferences, video calls, and identity authentication) for malicious purposes, including financial scams and political misinformation. Deepfake detection, as the countermeasure against deepfake, has attracted considerable attention from the academic community, yet existing works typically rely on learning passive features that may perform poorly beyond seen datasets. In this paper, we propose SFake, a new real-time deepfake detection method that innovatively exploits deepfake models' inability to adapt to physical interference. Specifically, SFake actively sends probes to trigger mechanical vibrations on the smartphone, resulting in the controllable feature on the footage. Consequently, SFake determines whether the face is swapped by deepfake based on the consistency of the facial area with the probe pattern. We implement SFake, evaluate its effectiveness on a self-built dataset, and compare it with six other detection methods. The results show that SFake outperforms other detection methods with higher detection accuracy, faster process speed, and lower memory consumption.


Key findings
SFake outperforms six other deepfake detection methods with accuracy exceeding 95.2%, a processing time under 5 seconds, and memory consumption below 450 MB. Its performance is robust across different deepfake algorithms and relatively insensitive to variations in lighting conditions, though sensitive to resolution and distance.
Approach
SFake actively triggers vibrations on a smartphone during video calls, causing controllable blur in the video. It then analyzes the consistency of this blur across the face to detect deepfakes, leveraging the deepfake model's inability to perfectly adapt to this physical interference.
Datasets
A self-built dataset with 8 brands of smartphones, 15 participants, and 5 existing deepfake algorithms.
Model(s)
A simple two-layer neural network is used as a classifier; existing deepfake detection models (SBI, FaceAF, CnnDetect, LRNet, DFHob, Deepaware) are used for comparison.
Author countries
Singapore