GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response

Authors: Govind Mittal, Chinmay Hegde, Nasir Memon

Published: 2022-10-12 13:15:54+00:00

AI Summary

This paper proposes GOTCHA, a real-time video deepfake detection method using a challenge-response approach. It leverages a taxonomy of challenges targeting weaknesses in deepfake generation pipelines, achieving high AUC scores in both human and automated evaluations.

Abstract

With the rise of AI-enabled Real-Time Deepfakes (RTDFs), the integrity of online video interactions has become a growing concern. RTDFs have now made it feasible to replace an imposter's face with their victim in live video interactions. Such advancement in deepfakes also coaxes detection to rise to the same standard. However, existing deepfake detection techniques are asynchronous and hence ill-suited for RTDFs. To bridge this gap, we propose a challenge-response approach that establishes authenticity in live settings. We focus on talking-head style video interaction and present a taxonomy of challenges that specifically target inherent limitations of RTDF generation pipelines. We evaluate representative examples from the taxonomy by collecting a unique dataset comprising eight challenges, which consistently and visibly degrades the quality of state-of-the-art deepfake generators. These results are corroborated both by humans and a new automated scoring function, leading to 88.6% and 80.1% AUC, respectively. The findings underscore the promising potential of challenge-response systems for explainable and scalable real-time deepfake detection in practical scenarios. We provide access to data and code at url{https://github.com/mittalgovind/GOTCHA-Deepfakes}.


Key findings
Human evaluation yielded an AUC of 88.6%, while automated evaluation using a custom model achieved an AUC of 80.1%. Challenges consistently degraded deepfake quality, with the effect being more pronounced for some challenges than others.
Approach
GOTCHA presents interactive challenges (e.g., head movements, occlusions) to a suspected deepfake. These challenges exploit limitations in deepfake generation, creating visible artifacts detectable by humans and automated scoring functions.
Datasets
A novel dataset of 56,247 videos from 47 real users performing eight challenges, along with deepfakes generated using LIA, FSGAN, and DeepFaceLab.
Model(s)
A 3D-ResNet18-based self-supervised fidelity scoring model for automated evaluation; existing deepfake detection models (SBI and FTCN) were also evaluated but performed poorly.
Author countries
USA