An Examination of Fairness of AI Models for Deepfake Detection

View on arXiv ← Back to list

Authors: Loc Trinh, Yan Liu

Published: 2021-05-02 21:55:04+00:00

AI Summary

This paper investigates bias in deepfake detection models, finding significant disparities in predictive performance across racial groups (up to a 10.7% difference in error rate). The bias stems from the datasets used for training, which are overwhelmingly composed of Caucasian subjects and contain disproportionate numbers of 'irregular' deepfakes (where faces are swapped onto individuals of different races or genders).

Abstract

Recent studies have demonstrated that deep learning models can discriminate based on protected classes like race and gender. In this work, we evaluate bias present in deepfake datasets and detection models across protected subgroups. Using facial datasets balanced by race and gender, we examine three popular deepfake detectors and find large disparities in predictive performances across races, with up to 10.7% difference in error rate between subgroups. A closer look reveals that the widely used FaceForensics++ dataset is overwhelmingly composed of Caucasian subjects, with the majority being female Caucasians. Our investigation of the racial distribution of deepfakes reveals that the methods used to create deepfakes as positive training signals tend to produce irregular faces - when a person's face is swapped onto another person of a different race or gender. This causes detectors to learn spurious correlations between the foreground faces and fakeness. Moreover, when detectors are trained with the Blended Image (BI) dataset from Face X-Rays, we find that those detectors develop systematic discrimination towards certain racial subgroups, primarily female Asians.

Key findings

Deepfake detectors exhibit significant racial bias, with error rates varying substantially across subgroups. This bias is linked to dataset imbalances in FaceForensics++, and the creation of 'irregular' deepfakes in the training process. Models trained with blended images showed the most pronounced bias, particularly towards female Asian subjects.

Approach

The researchers evaluated three popular deepfake detectors (MesoInception4, Xception, and Face X-Ray) trained on FaceForensics++ and a blended image dataset. They assessed performance on balanced datasets (RFW and UTKFace) across racial and gender subgroups, measuring metrics like AUC, error rate, TPR, and FPR to identify bias.

Datasets

FaceForensics++, Racial Face-in-the-Wild (RFW), UTKFace, Google's DeepfakeDetection, Celeb-DF, DeeperForensics-1.0

Model(s)

MesoInception4, Xception, Face X-Ray

Author countries

USA

← Previous