A War Beyond Deepfake: Benchmarking Facial Counterfeits and Countermeasures

Authors: Minh Tam Pham, Thanh Trung Huynh, Van Vinh Tong, Thanh Tam Nguyen, Thanh Thi Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen

Published: 2021-11-25 05:01:08+00:00

AI Summary

This paper introduces a dual benchmarking framework for evaluating visual forgery and forensic techniques. It integrates state-of-the-art counterfeit generators and detectors, measuring their performance across various criteria and analyzing results to understand the ongoing 'war' between forgery and detection.

Abstract

In recent years, visual forgery has reached a level of sophistication that humans cannot identify fraud, which poses a significant threat to information security. A wide range of malicious applications have emerged, such as fake news, defamation or blackmailing of celebrities, impersonation of politicians in political warfare, and the spreading of rumours to attract views. As a result, a rich body of visual forensic techniques has been proposed in an attempt to stop this dangerous trend. In this paper, we present a benchmark that provides in-depth insights into visual forgery and visual forensics, using a comprehensive and empirical approach. More specifically, we develop an independent framework that integrates state-of-the-arts counterfeit generators and detectors, and measure the performance of these techniques using various criteria. We also perform an exhaustive analysis of the benchmarking results, to determine the characteristics of the methods that serve as a comparative reference in this never-ending war between measures and countermeasures.


Key findings
XceptionNet and Capsule showed the best overall performance in ideal conditions, but were susceptible to noise and low resolution. GAN-fingerprint showed robustness to noise and compression. Traditional machine learning methods performed poorly, especially against advanced forgery techniques.
Approach
The authors developed an independent framework that integrates state-of-the-art forgery generators and detection methods. The performance of these techniques is measured using various criteria, and an exhaustive analysis of the results is performed.
Datasets
CelebA-HQ, DFDC, FaceForensic++, DeepFake-in-the-wild, Celeb-DF, UADFV, DF-TIMIT, and a new dual-benchmarking dataset (DBD) with 100,000 real images, 1,000,000 forged images, 19,000 real videos, and 21,095 forged videos.
Model(s)
Mesonet, Capsule, XceptionNet, GAN-fingerprint, FDBD, Visual-Artifacts, HPBD, FaceSwap-2D, FaceSwap-3D, 3DMM, DeepFake, StarGAN, ReenactGAN, Monkey-Net, X2Face.
Author countries
Vietnam, Australia, Germany