Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual Manipulations

Authors: Parul Gupta, Shreya Ghosh, Tom Gedeon, Thanh-Toan Do, Abhinav Dhall

Published: 2025-06-01 07:17:16+00:00

AI Summary

This paper introduces MultiFakeVerse, a large-scale dataset of person-centric deepfakes generated using vision-language models (VLMs). The dataset focuses on subtle manipulations impacting scene meaning rather than simple identity swaps, posing a significant challenge to existing deepfake detection models and human observers.

Abstract

The rapid advancement of GenAI technology over the past few years has significantly contributed towards highly realistic deepfake content generation. Despite ongoing efforts, the research community still lacks a large-scale and reasoning capability driven deepfake benchmark dataset specifically tailored for person-centric object, context and scene manipulations. In this paper, we address this gap by introducing MultiFakeVerse, a large scale person-centric deepfake dataset, comprising 845,286 images generated through manipulation suggestions and image manipulations both derived from vision-language models (VLM). The VLM instructions were specifically targeted towards modifications to individuals or contextual elements of a scene that influence human perception of importance, intent, or narrative. This VLM-driven approach enables semantic, context-aware alterations such as modifying actions, scenes, and human-object interactions rather than synthetic or low-level identity swaps and region-specific edits that are common in existing datasets. Our experiments reveal that current state-of-the-art deepfake detection models and human observers struggle to detect these subtle yet meaningful manipulations. The code and dataset are available on href{https://github.com/Parul-Gupta/MultiFakeVerse}{GitHub}.


Key findings
State-of-the-art deepfake detection models and human observers struggle to detect the subtle manipulations in MultiFakeVerse. Finetuning models on MultiFakeVerse improves performance, but limitations remain, especially in localization. The dataset highlights the difficulty in detecting semantically meaningful manipulations compared to simpler identity swaps.
Approach
The authors generate deepfakes by providing instructions to VLMs to subtly modify images, altering perceptions of individuals or scene elements. These modifications focus on changing the narrative or intent without overt identity changes. The generated images are then analyzed for perceptual impact.
Datasets
EMOTIC, PISC, PIPA, PIC 2.0
Model(s)
CNNSpot, AntifakePrompt, TruFor, SIDA-13B, Gemini-2.0-Flash-Image-Generation, GPT-Image-1, ICEdit, ShareGPT4V, Long-CLIP
Author countries
Australia