DFGC 2021: A DeepFake Game Competition

Authors: Bo Peng, Hongxing Fan, Wei Wang, Jing Dong, Yuezun Li, Siwei Lyu, Qi Li, Zhenan Sun, Han Chen, Baoying Chen, Yanjie Hu, Shenghai Luo, Junrui Huang, Yutong Yao, Boyuan Liu, Hefei Ling, Guosheng Zhang, Zhiliang Xu, Changtao Miao, Changlei Lu, Shan He, Xiaoyan Wu, Wanyi Zhuang

Published: 2021-06-02 15:10:13+00:00

AI Summary

This paper summarizes the DeepFake Game Competition (DFGC) 2021, a competition designed to benchmark the adversarial game between DeepFake creation and detection methods. The competition involved alternating phases of DeepFake creation and detection, with participants submitting datasets and detection models respectively.

Abstract

This paper presents a summary of the DFGC 2021 competition. DeepFake technology is developing fast, and realistic face-swaps are increasingly deceiving and hard to detect. At the same time, DeepFake detection methods are also improving. There is a two-party game between DeepFake creators and detectors. This competition provides a common platform for benchmarking the adversarial game between current state-of-the-art DeepFake creation and detection methods. In this paper, we present the organization, results and top solutions of this competition and also share our insights obtained during this event. We also release the DFGC-21 testing dataset collected from our participants to further benefit the research community.


Key findings
The competition demonstrated that adversarial attacks and novel DeepFake methods pose significant challenges to detection models. While top detection solutions achieved high performance, they relied heavily on data augmentation and the generation of adversarial examples during training. Cross-dataset analysis showed low generalization ability of detection models.
Approach
The competition used a multi-phase adversarial game format. Creation participants submitted DeepFake datasets, evaluated against detection models from the previous phase, and detection participants submitted models, evaluated against DeepFake datasets from the previous phase. The evaluation metrics considered image quality and the ability to deceive detection models.
Datasets
Celeb-DF v2 dataset (training and test sets). Participants in the creation track were tasked with creating 1000 face-swap images from specified frames of the Celeb-DF test set. The detection track used the Celeb-DF train set and the created fake images for training and evaluation.
Model(s)
Various models were used by participants, including EfficientNet-B3, EfficientNet-B7, FaceShifter, AutoEncoder, and ResNet18. Specific models varied by team and track (creation or detection).
Author countries
China, USA