GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response

Authors: Govind Mittal, Chinmay Hegde, Nasir Memon

Published: 2022-10-12 13:15:54+00:00

Comment: Accepted to IEEE Euro S&P 2024

AI Summary

This paper introduces GOTCHA, a challenge-response system for real-time video deepfake (RTDF) detection in live video interactions. It proposes a taxonomy of challenges designed to exploit inherent limitations of RTDF generation pipelines, evaluating them on a unique dataset of eight challenges. GOTCHA consistently degrades the quality of state-of-the-art deepfake generators, achieving 88.6% AUC with human evaluators and 80.1% AUC with an automated scoring function.

Abstract

With the rise of AI-enabled Real-Time Deepfakes (RTDFs), the integrity of online video interactions has become a growing concern. RTDFs have now made it feasible to replace an imposter's face with their victim in live video interactions. Such advancement in deepfakes also coaxes detection to rise to the same standard. However, existing deepfake detection techniques are asynchronous and hence ill-suited for RTDFs. To bridge this gap, we propose a challenge-response approach that establishes authenticity in live settings. We focus on talking-head style video interaction and present a taxonomy of challenges that specifically target inherent limitations of RTDF generation pipelines. We evaluate representative examples from the taxonomy by collecting a unique dataset comprising eight challenges, which consistently and visibly degrades the quality of state-of-the-art deepfake generators. These results are corroborated both by humans and a new automated scoring function, leading to 88.6% and 80.1% AUC, respectively. The findings underscore the promising potential of challenge-response systems for explainable and scalable real-time deepfake detection in practical scenarios. We provide access to data and code at \\url{https://github.com/mittalgovind/GOTCHA-Deepfakes}.


Key findings
Challenges significantly aid in detecting RTDFs, causing higher degradation scores in deepfake videos compared to original videos (human AUC = 88.6%, automated AUC = 80.1%). Human evaluators effectively identified deepfake artifacts, particularly 'vanishing object' for occlusion/facial deformation challenges. The efficacy of challenges in inducing degradation aligns between human and machine evaluations, with occlusions and facial deformations being most effective.
Approach
The approach proposes a challenge-response system where a defender presents specific, randomized tasks (challenges) to a suspected imposter during a live video call. These challenges are designed to induce visible artifacts and degrade the quality of real-time deepfakes by exploiting vulnerabilities in their generation pipelines. Detection is then performed by human evaluators or an automated ML-based fidelity scoring model combined with challenge-specific compliance detectors.
Datasets
A novel dataset of 56,247 short real and fake videos collected from 47 legitimate users performing eight challenges. Deepfake videos were generated using LIA (Latent Image Animator), FSGAN (Face Swapping Generative Adversarial Network), and DFL (DeepFaceLab). FFHQ was used for pretraining an advanced DFL variant.
Model(s)
A self-supervised ML-based fidelity scoring model using a 3D-ResNet18 as a backbone, trained with a contrastive loss. Challenge-specific compliance detectors were also used, based on methods like MediaPipe for face segmentation, predicting yaw and pitch for head movements, object detectors for facial deformations, EfficientNet for expression recognition, and analyzing peaks in face intensity for illumination changes.
Author countries
USA