Bridging the Gap: A Framework for Real-World Video Deepfake Detection via Social Network Compression Emulation

View on arXiv ← Back to list

Authors: Andrea Montibeller, Dasara Shullani, Daniele Baracchi, Alessandro Piva, Giulia Boato

Published: 2025-08-12 09:11:31+00:00

AI Summary

This research introduces a framework that emulates social network video compression to improve deepfake detection. By estimating compression parameters from a small set of uploaded videos, the framework generates realistically degraded data for training deepfake detectors, bridging the gap between lab-based and real-world performance.

Abstract

The growing presence of AI-generated videos on social networks poses new challenges for deepfake detection, as detectors trained under controlled conditions often fail to generalize to real-world scenarios. A key factor behind this gap is the aggressive, proprietary compression applied by platforms like YouTube and Facebook, which launder low-level forensic cues. However, replicating these transformations at scale is difficult due to API limitations and data-sharing constraints. For these reasons, we propose a first framework that emulates the video sharing pipelines of social networks by estimating compression and resizing parameters from a small set of uploaded videos. These parameters enable a local emulator capable of reproducing platform-specific artifacts on large datasets without direct API access. Experiments on FaceForensics++ videos shared via social networks demonstrate that our emulated data closely matches the degradation patterns of real uploads. Furthermore, detectors fine-tuned on emulated videos achieve comparable performance to those trained on actual shared media. Our approach offers a scalable and practical solution for bridging the gap between lab-based training and real-world deployment of deepfake detectors, particularly in the underexplored domain of compressed video content.

Key findings

Detectors fine-tuned on emulated videos achieve comparable performance to those trained on actual shared media. The framework requires fewer than 50 uploaded videos per resolution for accurate emulation. An ablation study suggests at least 30 videos per resolution are needed for reliable CRF estimation.

Approach

The approach emulates social network video compression pipelines by estimating compression and resizing parameters from a small set of uploaded videos. These parameters are then used to process a larger dataset, creating emulated videos for training and evaluating deepfake detectors.

Datasets

FaceForensics++ (FF++) dataset, with videos shared on Facebook, YouTube, and BlueSky.

Model(s)

DenseNet, InceptionNet, XceptionNet, and ResNet-50.

Author countries

Italy

← Previous