Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual Manipulations
Authors: Parul Gupta, Shreya Ghosh, Tom Gedeon, Thanh-Toan Do, Abhinav Dhall
Published: 2025-06-01 07:17:16+00:00
AI Summary
This paper introduces MultiFakeVerse, a large-scale dataset of person-centric deepfakes generated using vision-language models (VLMs). The dataset focuses on subtle manipulations impacting scene meaning rather than simple identity swaps, posing a significant challenge to existing deepfake detection models and human observers.
Abstract
The rapid advancement of GenAI technology over the past few years has significantly contributed towards highly realistic deepfake content generation. Despite ongoing efforts, the research community still lacks a large-scale and reasoning capability driven deepfake benchmark dataset specifically tailored for person-centric object, context and scene manipulations. In this paper, we address this gap by introducing MultiFakeVerse, a large scale person-centric deepfake dataset, comprising 845,286 images generated through manipulation suggestions and image manipulations both derived from vision-language models (VLM). The VLM instructions were specifically targeted towards modifications to individuals or contextual elements of a scene that influence human perception of importance, intent, or narrative. This VLM-driven approach enables semantic, context-aware alterations such as modifying actions, scenes, and human-object interactions rather than synthetic or low-level identity swaps and region-specific edits that are common in existing datasets. Our experiments reveal that current state-of-the-art deepfake detection models and human observers struggle to detect these subtle yet meaningful manipulations. The code and dataset are available on href{https://github.com/Parul-Gupta/MultiFakeVerse}{GitHub}.