Real-centric Consistency Learning for Deepfake Detection

Authors: Ruiqi Zha, Zhichao Lian, Qianmu Li, Siqi Gu

Published: 2022-05-15 07:01:28+00:00

AI Summary

This paper proposes a novel real-centric consistency learning method for deepfake detection that focuses on learning invariant representations of both real and fake faces, rather than solely identifying artifacts. The method achieves this by constraining feature extraction at both the sample and feature levels, improving robustness to internet interference.

Abstract

Most of previous deepfake detection researches bent their efforts to describe and discriminate artifacts in human perceptible ways, which leave a bias in the learned networks of ignoring some critical invariance features intra-class and underperforming the robustness of internet interference. Essentially, the target of deepfake detection problem is to represent natural faces and fake faces at the representation space discriminatively, and it reminds us whether we could optimize the feature extraction procedure at the representation space through constraining intra-class consistence and inter-class inconsistence to bring the intra-class representations close and push the inter-class representations apart? Therefore, inspired by contrastive representation learning, we tackle the deepfake detection problem through learning the invariant representations of both classes and propose a novel real-centric consistency learning method. We constraint the representation from both the sample level and the feature level. At the sample level, we take the procedure of deepfake synthesis into consideration and propose a novel forgery semantical-based pairing strategy to mine latent generation-related features. At the feature level, based on the centers of natural faces at the representation space, we design a hard positive mining and synthesizing method to simulate the potential marginal features. Besides, a hard negative fusion method is designed to improve the discrimination of negative marginal features with the help of supervised contrastive margin loss we developed. The effectiveness and robustness of the proposed method has been demonstrated through extensive experiments.


Key findings
The proposed method outperforms existing methods on the Celeb-DF-v2 dataset and its compressed versions. The ablation study demonstrates the effectiveness of both the forgery semantical guided pairing strategy and the real-centric hard feature fusion method. The approach shows improved robustness against compression artifacts.
Approach
The approach uses contrastive representation learning to learn invariant features for both real and fake faces. It introduces a forgery semantical-based pairing strategy at the sample level and a real-centric hard feature fusion method at the feature level to improve the discriminative power of the learned representations. A supervised contrastive margin loss is used.
Datasets
Celeb-DF-v2, with compressed versions (c23 and c40) to simulate internet interference.
Model(s)
Xception (pretrained on ImageNet) as the encoder and a two-layer perceptron as the projector. A linear classifier is trained with cross-entropy loss.
Author countries
China