Evaluating Deepfake Detectors in the Wild
Authors: Viacheslav Pirogov, Maksim Artemev
Published: 2025-07-29 15:17:00+00:00
AI Summary
This paper evaluates the performance of state-of-the-art deepfake detectors on a novel, large-scale dataset (over 500,000 images) designed to mimic real-world scenarios. The results reveal that many detectors perform poorly under realistic conditions, with less than half achieving an AUC score above 60%.
Abstract
Deepfakes powered by advanced machine learning models present a significant and evolving threat to identity verification and the authenticity of digital media. Although numerous detectors have been developed to address this problem, their effectiveness has yet to be tested when applied to real-world data. In this work we evaluate modern deepfake detectors, introducing a novel testing procedure designed to mimic real-world scenarios for deepfake detection. Using state-of-the-art deepfake generation methods, we create a comprehensive dataset containing more than 500,000 high-quality deepfake images. Our analysis shows that detecting deepfakes still remains a challenging task. The evaluation shows that in fewer than half of the deepfake detectors tested achieved an AUC score greater than 60%, with the lowest being 50%. We demonstrate that basic image manipulations, such as JPEG compression or image enhancement, can significantly reduce model performance. All code and data are publicly available at https://github.com/SumSubstance/Deepfake-Detectors-in-the-Wild.