AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

Authors: Li Lin, Santosh, Mingyang Wu, Xin Wang, Shu Hu

Published: 2024-06-02 15:51:33+00:00

AI Summary

This paper introduces AI-Face, a million-scale demographically annotated dataset of AI-generated face images, including real and fake faces from various sources. Using AI-Face, the authors conduct the first comprehensive fairness benchmark for AI face detectors, revealing biases and providing insights for fairer detector design.

Abstract

AI-generated faces have enriched human life, such as entertainment, education, and art. However, they also pose misuse risks. Therefore, detecting AI-generated faces becomes crucial, yet current detectors show biased performance across different demographic groups. Mitigating biases can be done by designing algorithmic fairness methods, which usually require demographically annotated face datasets for model training. However, no existing dataset encompasses both demographic attributes and diverse generative methods simultaneously, which hinders the development of fair detectors for AI-generated faces. In this work, we introduce the AI-Face dataset, the first million-scale demographically annotated AI-generated face image dataset, including real faces, faces from deepfake videos, and faces generated by Generative Adversarial Networks and Diffusion Models. Based on this dataset, we conduct the first comprehensive fairness benchmark to assess various AI face detectors and provide valuable insights and findings to promote the future fair design of AI face detectors. Our AI-Face dataset and benchmark code are publicly available at https://github.com/Purdue-M2/AI-Face-FairnessBench


Key findings
Fairness-enhanced models generally perform best, but even these show biases towards minority groups (lighter skin tones, females). All detectors exhibit significant performance degradation under image post-processing and cross-dataset evaluation, highlighting the need for robustness and generalization improvements. Increasing training data size does not always improve fairness.
Approach
AI-Face is created by collecting real and AI-generated face images from various public datasets. Demographic annotations (skin tone, gender, age) are generated using a novel method combining facial landmark detection, color analysis, and a CLIP-based lightweight annotator trained on IMDB-WIKI to mitigate bias. A fairness benchmark evaluates 12 representative detectors across multiple metrics.
Datasets
AI-Face (created by the authors), FF++, DFDC, DFD, Celeb-DF-v2, GenData, DF-Platter, DeePhy, DF-1.0, A-FF++, A-DFD, A-DFDC, A-Celeb-DF-v2, A-DF-1.0, FFHQ, IMDB-WIKI, CelebA, Casual Conversations v2 (CCv2)
Model(s)
Xception, EfficientB4, ViT-B/16, F3Net, SPSL, SRM, UCF, UnivFD, CORE, DAW-FDD, DAG-FDD, PG-FDD
Author countries
USA