Age-Diverse Deepfake Dataset: Bridging the Age Gap in Deepfake Detection

Authors: Unisha Joshi

Published: 2025-08-06 05:18:01+00:00

AI Summary

This paper introduces an age-diverse deepfake dataset to mitigate age bias in deepfake detection. The dataset is created by combining existing datasets (Celeb-DF, FaceForensics++, UTKFace) with synthetic data, and its effectiveness is evaluated using three deepfake detection models (XceptionNet, EfficientNet, LipForensics). Models trained on this dataset showed improved fairness across age groups and better generalization.

Abstract

The challenges associated with deepfake detection are increasing significantly with the latest advancements in technology and the growing popularity of deepfake videos and images. Despite the presence of numerous detection models, demographic bias in the deepfake dataset remains largely unaddressed. This paper focuses on the mitigation of age-specific bias in the deepfake dataset by introducing an age-diverse deepfake dataset that will improve fairness across age groups. The dataset is constructed through a modular pipeline incorporating the existing deepfake datasets Celeb-DF, FaceForensics++, and UTKFace datasets, and the creation of synthetic data to fill the age distribution gaps. The effectiveness and generalizability of this dataset are evaluated using three deepfake detection models: XceptionNet, EfficientNet, and LipForensics. Evaluation metrics, including AUC, pAUC, and EER, revealed that models trained on the age-diverse dataset demonstrated fairer performance across age groups, improved overall accuracy, and higher generalization across datasets. This study contributes a reproducible, fairness-aware deepfake dataset and model pipeline that can serve as a foundation for future research in fairer deepfake detection. The complete dataset and implementation code are available at https://github.com/unishajoshi/age-diverse-deepfake-detection.


Key findings
Models trained on the age-diverse dataset showed significantly improved performance and fairness across different age groups compared to models trained on the original, biased datasets. The age-diverse dataset also demonstrated better generalization across different datasets. EfficientNet achieved the best overall performance.
Approach
The authors addressed age bias in deepfake detection by creating an age-diverse dataset. This was achieved by combining existing deepfake datasets with synthetic data generated using SimSwap and InsightFace to fill age gaps. Three pre-trained deepfake detection models were then trained and evaluated on this new dataset.
Datasets
Celeb-DF, FaceForensics++, UTKFace
Model(s)
XceptionNet, EfficientNet, LipForensics
Author countries
USA