In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection Protocol

Authors: Wei-Han Wang, Chin-Yuan Yeh, Hsi-Wen Chen, De-Nian Yang, Ming-Syan Chen

Published: 2024-05-01 12:48:13+00:00

AI Summary

This paper introduces the Rebalanced Deepfake Detection Protocol (RDDP) to evaluate deepfake detectors under challenging conditions where genuine and forged videos share similar artifacts, anticipating future 'perfect' deepfakes. To address this, they propose ID-Miner, an identity-anchored artifact-agnostic detector that focuses on facial action sequences rather than visual artifacts or appearances. ID-Miner outperforms baseline detectors in both conventional and RDDP evaluations, demonstrating its robustness against sophisticated deepfakes.

Abstract

As deep generative models advance, we anticipate deepfakes achieving perfection-generating no discernible artifacts or noise. However, current deepfake detectors, intentionally or inadvertently, rely on such artifacts for detection, as they are exclusive to deepfakes and absent in genuine examples. To bridge this gap, we introduce the Rebalanced Deepfake Detection Protocol (RDDP) to stress-test detectors under balanced scenarios where genuine and forged examples bear similar artifacts. We offer two RDDP variants: RDDP-WHITEHAT uses white-hat deepfake algorithms to create 'self-deepfakes,' genuine portrait videos with the resemblance of the underlying identity, yet carry similar artifacts to deepfake videos; RDDP-SURROGATE employs surrogate functions (e.g., Gaussian noise) to process both genuine and forged examples, introducing equivalent noise, thereby sidestepping the need of deepfake algorithms. Towards detecting perfect deepfake videos that aligns with genuine ones, we present ID-Miner, a detector that identifies the puppeteer behind the disguise by focusing on motion over artifacts or appearances. As an identity-based detector, it authenticates videos by comparing them with reference footage. Equipped with the artifact-agnostic loss at frame-level and the identity-anchored loss at video-level, ID-Miner effectively singles out identity signals amidst distracting variations. Extensive experiments comparing ID-Miner with 12 baseline detectors under both conventional and RDDP evaluations with two deepfake datasets, along with additional qualitative studies, affirm the superiority of our method and the necessity for detectors designed to counter perfect deepfakes.


Key findings
Existing deepfake detectors exhibit significant performance degradation (18% to 35% AUC drop) under the RDDP, highlighting their reliance on easily discernible artifacts. ID-Miner consistently outperforms 12 baseline detectors under RDDP evaluations, showing a much smaller performance drop (around 6% AUC drop). The method also demonstrates superior generalizability across different deepfake techniques and excels in puppeteer re-identification by focusing on action-based features.
Approach
The authors introduce the Rebalanced Deepfake Detection Protocol (RDDP) with two variants (WHITEHAT and SURROGATE) to create a balanced testing environment where deepfake artifacts are present in both genuine and forged examples. Their proposed detector, ID-Miner, uses a hierarchical approach: a pre-trained Facial Action Unit (FAU) extractor feeds into an artifact-agnostic encoder at the frame level and an identity-anchored GRU aggregator at the video level, trained with contrastive losses to focus on robust action sequences.
Datasets
VoxCeleb, Celeb-DF
Model(s)
ID-Miner (based on a pre-trained Facial Action Unit (FAU) extractor, an artifact-agnostic encoder, and a Gated Recurrent Unit (GRU) for video-level aggregation).
Author countries
Taiwan