Protecting Celebrities from DeepFake with Identity Consistency Transformer

Authors: Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Ting Zhang, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

Published: 2022-03-02 18:59:58+00:00

Comment: To Appear at CVPR 2022, code is available at https://github.com/LightDXY/ICT_DeepFake

AI Summary

This paper introduces the Identity Consistency Transformer (ICT), a novel face forgery detection method that leverages high-level identity semantics to detect inconsistencies between inner and outer face regions. By incorporating a unique consistency loss, ICT demonstrates superior generalization across diverse datasets and image degradation types. An enhanced version, ICT-Ref, further boosts performance by integrating additional celebrity identity information, making it highly effective for real-world deepfake detection.

Abstract

In this work we propose Identity Consistency Transformer, a novel face forgery detection method that focuses on high-level semantics, specifically identity information, and detecting a suspect face by finding identity inconsistency in inner and outer face regions. The Identity Consistency Transformer incorporates a consistency loss for identity consistency determination. We show that Identity Consistency Transformer exhibits superior generalization ability not only across different datasets but also across various types of image degradation forms found in real-world applications including deepfake videos. The Identity Consistency Transformer can be easily enhanced with additional identity information when such information is available, and for this reason it is especially well-suited for detecting face forgeries involving celebrities. Code will be released at \\url{https://github.com/LightDXY/ICT_DeepFake}


Key findings
The ICT method exhibits significantly superior generalization ability compared to state-of-the-art low-level texture-based methods, performing robustly across unseen datasets and various image degradation forms. The reference-assisted ICT (ICT-Ref) further achieves state-of-the-art performance on conventional benchmarks and real-world celebrity deepfake videos, boosting accuracy to 100% in some cases. The proposed consistency loss is crucial, contributing a 23-40% improvement in AUC, validating its importance for identity consistency determination.
Approach
The Identity Consistency Transformer (ICT) is proposed to detect face forgeries by simultaneously learning identity information from both inner and outer face regions using a Vision Transformer backbone. It employs a novel consistency loss that encourages similarity between inner and outer identities from the same person and pushes them apart if they originate from different individuals. For enhanced celebrity deepfake detection, a reference-assisted variant (ICT-Ref) utilizes a pre-constructed set of authentic identity vector pairs for improved consistency comparison.
Datasets
MS-Celeb-1M, FaceForensics++ (FF++), DeepFake Detection (DFD), Celeb-DeepFake v1 (CD1), Celeb-DeepFake v2 (CD2), DeeperForensics (Deeper)
Model(s)
Identity Consistency Transformer (ICT), based on a Vision Transformer (ViT) architecture.
Author countries
China, USA