OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild

Authors: Trung-Nghia Le, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

Published: 2021-07-30 08:15:41+00:00

AI Summary

This paper introduces OpenForensics, a large-scale dataset for multi-face forgery detection and segmentation in-the-wild, addressing the limitations of existing datasets by providing rich face-wise annotations and diverse scenarios. It also presents a benchmark suite evaluating state-of-the-art methods on this new dataset.

Abstract

The proliferation of deepfake media is raising concerns among the public and relevant authorities. It has become essential to develop countermeasures against forged faces in social media. This paper presents a comprehensive study on two new countermeasure tasks: multi-face forgery detection and segmentation in-the-wild. Localizing forged faces among multiple human faces in unrestricted natural scenes is far more challenging than the traditional deepfake recognition task. To promote these new tasks, we have created the first large-scale dataset posing a high level of challenges that is designed with face-wise rich annotations explicitly for face forgery detection and segmentation, namely OpenForensics. With its rich annotations, our OpenForensics dataset has great potentials for research in both deepfake prevention and general human face detection. We have also developed a suite of benchmarks for these tasks by conducting an extensive evaluation of state-of-the-art instance detection and segmentation methods on our newly constructed dataset in various scenarios. The dataset, benchmark results, codes, and supplementary materials will be publicly available on our project page: https://sites.google.com/view/ltnghia/research/openforensics


Key findings
BlendMask achieved the best performance in both detection and segmentation tasks on standard images. YOLACT++ and BlendMask showed the highest robustness on unseen images. Human participants struggled to identify forged faces in OpenForensics, highlighting the realism of the dataset and the challenge of the task.
Approach
The authors created OpenForensics by collecting real images, synthesizing forged faces using GANs and pose transformation to avoid repeated deepfake model training, and applying diverse perturbations to simulate real-world scenarios. They then benchmarked state-of-the-art instance detection and segmentation models on the dataset.
Datasets
OpenForensics dataset (created by the authors), Google Open Images, FaceForensics++, DFDC, Celeb-DF, DeeperForensics
Model(s)
Mask R-CNN, MSRCNN, RetinaMask, YOLACT, YOLACT++, CenterMask, BlendMask, PolarMask, MEInst, CondInst, SOLO, SOLO2, XceptionNet (for forgery justification)
Author countries
Japan