Ensemble-Based Deepfake Detection using State-of-the-Art Models with Robust Cross-Dataset Generalisation

Authors: Haroon Wahab, Hassan Ugail, Lujain Jaleel

Published: 2025-07-08 13:54:48+00:00

AI Summary

This research explores ensemble methods to improve the generalization of deepfake detection across diverse datasets. By combining predictions from multiple state-of-the-art models, the study demonstrates that ensembles provide more stable and reliable performance than individual models, addressing the challenge of poor generalization in out-of-distribution data.

Abstract

Machine learning-based Deepfake detection models have achieved impressive results on benchmark datasets, yet their performance often deteriorates significantly when evaluated on out-of-distribution data. In this work, we investigate an ensemble-based approach for improving the generalization of deepfake detection systems across diverse datasets. Building on a recent open-source benchmark, we combine prediction probabilities from several state-of-the-art asymmetric models proposed at top venues. Our experiments span two distinct out-of-domain datasets and demonstrate that no single model consistently outperforms others across settings. In contrast, ensemble-based predictions provide more stable and reliable performance in all scenarios. Our results suggest that asymmetric ensembling offers a robust and scalable solution for real-world deepfake detection where prior knowledge of forgery type or quality is often unavailable.


Key findings
Individual deepfake detection models show significant performance variations across different datasets. Ensemble methods consistently achieve competitive or superior performance across datasets, exhibiting greater robustness and stability than individual models. This suggests that ensembles are a more reliable solution for real-world deepfake detection where data distribution is unknown.
Approach
The authors employ an ensemble-based approach, combining prediction probabilities from six state-of-the-art asymmetric deepfake detection models. Two ensemble variants are evaluated: skill-weighted probability averaging and simple unweighted averaging. These are applied to improve cross-dataset generalization.
Datasets
FaceForensics++, Celeb-DF-v2, UADFV
Model(s)
MesoInception-4, Xception, Core, FFD, SRM, UCF
Author countries
UK