SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms

View on arXiv ← Back to list

Authors: Arnesh Batra, Anushk Kumar, Jashn Khemani, Arush Gumber, Arhan Jain, Somil Gupta

Published: 2025-06-05 19:39:28+00:00

AI Summary

This paper introduces SocialDF, a new dataset of real and deepfake videos from social media, addressing the limitations of existing datasets. It also proposes a novel LLM-based multi-factor detection approach combining facial recognition, speech transcription, and a multi-agent LLM pipeline for robust audio-visual deepfake detection.

Abstract

The rapid advancement of deep generative models has significantly improved the realism of synthetic media, presenting both opportunities and security challenges. While deepfake technology has valuable applications in entertainment and accessibility, it has emerged as a potent vector for misinformation campaigns, particularly on social media. Existing detection frameworks struggle to distinguish between benign and adversarially generated deepfakes engineered to manipulate public perception. To address this challenge, we introduce SocialDF, a curated dataset reflecting real-world deepfake challenges on social media platforms. This dataset encompasses high-fidelity deepfakes sourced from various online ecosystems, ensuring broad coverage of manipulative techniques. We propose a novel LLM-based multi-factor detection approach that combines facial recognition, automated speech transcription, and a multi-agent LLM pipeline to cross-verify audio-visual cues. Our methodology emphasizes robust, multi-modal verification techniques that incorporate linguistic, behavioral, and contextual analysis to effectively discern synthetic media from authentic content.

Key findings

The proposed LLM-based multi-factor approach achieved significantly higher accuracy (90.4%) compared to the baseline LipFD model (51.24%) on the SocialDF dataset. DeepSeek R-1 LLM provided the best performance among the tested LLMs. The results highlight the limitations of single-modality approaches and the effectiveness of a multimodal, context-aware approach for robust deepfake detection.

Approach

The proposed approach uses a two-stage pipeline. The first stage identifies individuals in the video using face recognition and transcribes speech using ASR. The second stage employs a multi-agent LLM pipeline to analyze the transcribed speech and identified individuals, cross-verifying audio-visual cues and assessing authenticity.

Datasets

SocialDF: A dataset of 2,126 short-form videos (1,071 real, 1,055 deepfakes) sourced from social media platforms like Instagram Reels and Stories, featuring high-fidelity deepfakes and diverse contexts.

Model(s)

YOLO (for face detection), FaceNet (for facial feature extraction), Whisper (for speech transcription), Llama 3.3, Qwen, and DeepSeek R-1 (Large Language Models). LipFD was used as a baseline model for comparison.

Author countries

India

← Previous