Exposing Deepfake Face Forgeries with Guided Residuals

Authors: Zhiqing Guo, Gaobo Yang, Jiyou Chen, Xingming Sun

Published: 2022-05-02 08:58:19+00:00

AI Summary

This paper introduces GRnet, a guided residuals network, for exposing deepfake face forgeries by effectively fusing spatial-domain and residual-domain features. It proposes a Manipulation Trace Extractor (MTE) to directly preserve manipulation traces by removing content features and an Attention Fusion Mechanism (AFM) to adaptively combine features. GRnet achieves state-of-the-art performance across various public deepfake datasets, demonstrating improved accuracy and robustness.

Abstract

Residual-domain feature is very useful for Deepfake detection because it suppresses irrelevant content features and preserves key manipulation traces. However, inappropriate residual prediction will bring side effects on detection accuracy. In addition, residual-domain features are easily affected by image operations such as compression. Most existing works exploit either spatial-domain features or residual-domain features, while neglecting that two types of features are mutually correlated. In this paper, we propose a guided residuals network, namely GRnet, which fuses spatial-domain and residual-domain features in a mutually reinforcing way, to expose face images generated by Deepfake. Different from existing prediction based residual extraction methods, we propose a manipulation trace extractor (MTE) to directly remove the content features and preserve manipulation traces. MTE is a fine-grained method that can avoid the potential bias caused by inappropriate prediction. Moreover, an attention fusion mechanism (AFM) is designed to selectively emphasize feature channel maps and adaptively allocate the weights for two streams. The experimental results show that the proposed GRnet achieves better performances than the state-of-the-art works on four public fake face datasets including HFF, FaceForensics++, DFDC and Celeb-DF. Especially, GRnet achieves an average accuracy of 97.72% on the HFF dataset, which is at least 5.25% higher than the existing works.


Key findings
GRnet consistently outperforms state-of-the-art methods across four public fake face datasets, achieving an average accuracy of 97.72% on the HFF dataset, which is at least 5.25% higher than existing works. The MTE effectively suppresses image content to highlight manipulation traces, and the AFM improves robustness by adaptively fusing spatial and residual features for both high and low-quality images.
Approach
The authors propose GRnet, a dual-stream network that fuses spatial-domain and residual-domain features. It employs a novel Manipulation Trace Extractor (MTE) utilizing a guided filter to directly extract fine-grained manipulation traces by suppressing content features, avoiding prediction bias. An Attention Fusion Mechanism (AFM) then adaptively combines these spatial and residual features, selectively emphasizing channel maps and allocating weights based on stream loss values to enhance discriminative power.
Datasets
Hybrid Fake Face (HFF), FaceForensics++ (FF), DeepFake Detection Challenge (DFDC), Celeb-DF
Model(s)
GRnet (Guided Residuals Network) with ResNet-18 as backbone, Manipulation Trace Extractor (MTE), Attention Fusion Mechanism (AFM)
Author countries
China