Deepfake Forensics via An Adversarial Game

Authors: Zhi Wang, Yiwen Guo, Wangmeng Zuo

Published: 2021-03-25 02:20:08+00:00

AI Summary

This paper proposes adversarial training with a novel blurring method to improve the generalization ability of deepfake detection models. The approach focuses on blurring high-frequency artifacts often present in deepfakes, forcing the model to learn more robust and generalizable features.

Abstract

With the progress in AI-based facial forgery (i.e., deepfake), people are increasingly concerned about its abuse. Albeit effort has been made for training classification (also known as deepfake detection) models to recognize such forgeries, existing models suffer from poor generalization to unseen forgery technologies and high sensitivity to changes in image/video quality. In this paper, we advocate adversarial training for improving the generalization ability to both unseen facial forgeries and unseen image/video qualities. We believe training with samples that are adversarially crafted to attack the classification models improves the generalization ability considerably. Considering that AI-based face manipulation often leads to high-frequency artifacts that can be easily spotted by models yet difficult to generalize, we further propose a new adversarial training method that attempts to blur out these specific artifacts, by introducing pixel-wise Gaussian blurring models. With adversarial training, the classification models are forced to learn more discriminative and generalizable features, and the effectiveness of our method can be verified by plenty of empirical evidence. Our code will be made publicly available.


Key findings
Adversarial training significantly improved the generalization of deepfake detection models to unseen forgery technologies and video qualities. The proposed blurring-based adversarial examples were effective in improving model robustness and generalization. The two-generator approach further enhanced performance compared to single-generator methods.
Approach
The authors use adversarial training to improve deepfake detection model generalization. A novel adversarial attack is proposed, based on pixel-wise Gaussian blurring, to obscure high-frequency artifacts. This blurring is controlled by a learned standard deviation map.
Datasets
FaceForensics++ (FF++), DFD, Celeb-DF
Model(s)
EfficientNet, Xception, models from Stehouwer et al. [36] and Zhao et al. [37]
Author countries
China