Fair-FLIP: Fair Deepfake Detection with Fairness-Oriented Final Layer Input Prioritising

Authors: Tomasz Szandala, Fatima Ezzeddine, Natalia Rusin, Silvia Giordano, Omran Ayoub

Published: 2025-07-11 15:17:02+00:00

AI Summary

This paper introduces Fair-FLIP, a post-processing method for mitigating bias in deepfake detection models. Fair-FLIP reweights a model's final-layer inputs to reduce subgroup disparities by prioritizing features with low variability across demographic groups, improving fairness metrics without significantly impacting accuracy.

Abstract

Artificial Intelligence-generated content has become increasingly popular, yet its malicious use, particularly the deepfakes, poses a serious threat to public trust and discourse. While deepfake detection methods achieve high predictive performance, they often exhibit biases across demographic attributes such as ethnicity and gender. In this work, we tackle the challenge of fair deepfake detection, aiming to mitigate these biases while maintaining robust detection capabilities. To this end, we propose a novel post-processing approach, referred to as Fairness-Oriented Final Layer Input Prioritising (Fair-FLIP), that reweights a trained model's final-layer inputs to reduce subgroup disparities, prioritising those with low variability while demoting highly variable ones. Experimental results comparing Fair-FLIP to both the baseline (without fairness-oriented de-biasing) and state-of-the-art approaches show that Fair-FLIP can enhance fairness metrics by up to 30% while maintaining baseline accuracy, with only a negligible reduction of 0.25%. Code is available on Github: https://github.com/szandala/fair-deepfake-detection-toolbox


Key findings
Fair-FLIP enhances fairness metrics (TPP, FPP, PPV, NPV) by up to 30% compared to baseline and other state-of-the-art methods. This improvement comes with only a negligible reduction in accuracy (0.25%), demonstrating its effectiveness in mitigating bias while maintaining high predictive performance. Fair-FLIP also shows comparable explainability to the baseline model.
Approach
Fair-FLIP is a post-processing technique that reweights the final-layer features of a pre-trained deepfake detection model. It prioritizes features with low variability across different ethnic groups, reducing the impact of features that might encode ethnicity-specific biases. This is done without retraining the model or requiring demographic information during inference.
Datasets
A Kaggle dataset of 190,335 face images, evenly split between manipulated and authentic, with ethnicity labels obtained using a facial attribute model and manual inspection.
Model(s)
Vision Transformer (google/vit-base-patch16-224-in21k, pre-trained on ImageNet-21k)
Author countries
Switzerland