Data-Driven Fairness Generalization for Deepfake Detection

Authors: Uzoamaka Ezeakunne, Chrisantus Eze, Xiuwen Liu

Published: 2024-12-21 01:28:35+00:00

AI Summary

This paper addresses fairness generalization in deepfake detection by proposing a data-driven framework that leverages synthetic data generation and model optimization. The framework balances demographic representation in training data, uses a multi-task learning architecture to optimize for both detection accuracy and fairness, and employs sharpness-aware minimization for improved generalization.

Abstract

Despite the progress made in deepfake detection research, recent studies have shown that biases in the training data for these detectors can result in varying levels of performance across different demographic groups, such as race and gender. These disparities can lead to certain groups being unfairly targeted or excluded. Traditional methods often rely on fair loss functions to address these issues, but they under-perform when applied to unseen datasets, hence, fairness generalization remains a challenge. In this work, we propose a data-driven framework for tackling the fairness generalization problem in deepfake detection by leveraging synthetic datasets and model optimization. Our approach focuses on generating and utilizing synthetic data to enhance fairness across diverse demographic groups. By creating a diverse set of synthetic samples that represent various demographic groups, we ensure that our model is trained on a balanced and representative dataset. This approach allows us to generalize fairness more effectively across different domains. We employ a comprehensive strategy that leverages synthetic data, a loss sharpness-aware optimization pipeline, and a multi-task learning framework to create a more equitable training environment, which helps maintain fairness across both intra-dataset and cross-dataset evaluations. Extensive experiments on benchmark deepfake detection datasets demonstrate the efficacy of our approach, surpassing state-of-the-art approaches in preserving fairness during cross-dataset evaluation. Our results highlight the potential of synthetic datasets in achieving fairness generalization, providing a robust solution for the challenges faced in deepfake detection.


Key findings
The proposed method achieved comparable detection accuracy to existing approaches while significantly improving fairness across demographic groups in both intra- and cross-dataset evaluations. The method showed substantial reduction in demographic disparities compared to baselines, demonstrating the effectiveness of the synthetic data generation and multi-task learning framework.
Approach
The authors address fairness generalization by generating synthetic data using self-blended images (SBI) to balance demographic representation. A multi-task learning architecture with separate heads for deepfake detection and demographic classification is used, optimized with Sharpness-Aware Minimization (SAM) to improve fairness and generalization.
Datasets
FaceForensics++ (FF++), Deepfake Detection Challenge (DFDC), Celeb-DF
Model(s)
EfficientNet with two Multi-Layer Perceptron (MLP) heads (one for deepfake detection and one for demographic classification)
Author countries
USA