Protecting Celebrities from DeepFake with Identity Consistency Transformer

Authors: Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Ting Zhang, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

Published: 2022-03-02 18:59:58+00:00

AI Summary

This paper introduces Identity Consistency Transformer (ICT), a face forgery detection method that identifies inconsistencies between inner and outer face regions to detect deepfakes. ICT utilizes a consistency loss to improve identity consistency determination and shows superior generalization across datasets and image degradation forms, making it particularly suitable for detecting deepfakes of celebrities.

Abstract

In this work we propose Identity Consistency Transformer, a novel face forgery detection method that focuses on high-level semantics, specifically identity information, and detecting a suspect face by finding identity inconsistency in inner and outer face regions. The Identity Consistency Transformer incorporates a consistency loss for identity consistency determination. We show that Identity Consistency Transformer exhibits superior generalization ability not only across different datasets but also across various types of image degradation forms found in real-world applications including deepfake videos. The Identity Consistency Transformer can be easily enhanced with additional identity information when such information is available, and for this reason it is especially well-suited for detecting face forgeries involving celebrities. Code will be released at url{https://github.com/LightDXY/ICT_DeepFake}


Key findings
ICT demonstrates superior generalization across various datasets and image degradation types compared to existing methods. The reference-assisted ICT-Ref achieves state-of-the-art performance on benchmark datasets. The proposed consistency loss significantly improves performance, highlighting the importance of high-level semantic analysis for deepfake detection.
Approach
ICT detects deepfakes by analyzing the consistency of identity information extracted from the inner and outer regions of a face. It uses a vision transformer to learn identity vectors for both regions simultaneously and incorporates a consistency loss to penalize discrepancies in identities. A reference-assisted version (ICT-Ref) leverages publicly available celebrity images for enhanced detection.
Datasets
MS-Celeb-1M (training); FaceForensics++, DeepFake Detection, Celeb-DeepFake v1, Celeb-DeepFake v2, DeeperForensics (testing)
Model(s)
Vision Transformer with a consistency loss. A reference-assisted version (ICT-Ref) enhances the model by incorporating a reference set of celebrity images.
Author countries
China, USA