SynthForensics: A Multi-Generator Benchmark for Detecting Synthetic Video Deepfakes
Authors: Roberto Leotta, Salvatore Alfio Sambataro, Claudio Vittorio Ragaglia, Mirko Casu, Yuri Petralia, Francesco Guarnera, Luca Guarnera, Sebastiano Battiato
Published: 2026-02-04 16:47:37+00:00
AI Summary
This paper introduces SynthForensics, the first human-centric benchmark designed for detecting purely synthetic video deepfakes generated by modern text-to-video (T2V) models. Comprising 6,815 unique videos from five state-of-the-art open-source T2V models, the dataset is meticulously constructed with a two-stage human-in-the-loop validation and provided in four compression versions for robustness testing. Experiments reveal that existing deepfake detectors are fragile and exhibit limited generalization to this new domain, often performing worse than random chance under heavy compression.
Abstract
The landscape of synthetic media has been irrevocably altered by text-to-video (T2V) models, whose outputs are rapidly approaching indistinguishability from reality. Critically, this technology is no longer confined to large-scale labs; the proliferation of efficient, open-source generators is democratizing the ability to create high-fidelity synthetic content on consumer-grade hardware. This makes existing face-centric and manipulation-based benchmarks obsolete. To address this urgent threat, we introduce SynthForensics, to the best of our knowledge the first human-centric benchmark for detecting purely synthetic video deepfakes. The benchmark comprises 6,815 unique videos from five architecturally distinct, state-of-the-art open-source T2V models. Its construction was underpinned by a meticulous two-stage, human-in-the-loop validation to ensure high semantic and visual quality. Each video is provided in four versions (raw, lossless, light, and heavy compression) to enable real-world robustness testing. Experiments demonstrate that state-of-the-art detectors are both fragile and exhibit limited generalization when evaluated on this new domain: we observe a mean performance drop of $29.19\\%$ AUC, with some methods performing worse than random chance, and top models losing over 30 points under heavy compression. The paper further investigates the efficacy of training on SynthForensics as a means to mitigate these observed performance gaps, achieving robust generalization to unseen generators ($93.81\\%$ AUC), though at the cost of reduced backward compatibility with traditional manipulation-based deepfakes. The complete dataset and all generation metadata, including the specific prompts and inference parameters for every video, will be made publicly available at [link anonymized for review].