Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake Detection

View on arXiv ← Back to list

Authors: Yabin Wang, Zhiwu Huang, Zhiheng Ma, Xiaopeng Hong

Published: 2024-01-04 16:19:52+00:00

AI Summary

This paper introduces DFLIP-3K, a large-scale deepfake database containing 300K diverse deepfake samples from approximately 3K generative models and their corresponding linguistic footprints (190K prompts). This benchmark promotes research on linguistic profiling for deepfake detection, model identification, and prompt prediction, addressing limitations of existing datasets and methods.

Abstract

The emergence of text-to-image generative models has revolutionized the field of deepfakes, enabling the creation of realistic and convincing visual content directly from textual descriptions. However, this advancement presents considerably greater challenges in detecting the authenticity of such content. Existing deepfake detection datasets and methods often fall short in effectively capturing the extensive range of emerging deepfakes and offering satisfactory explanatory information for detection. To address the significant issue, this paper introduces a deepfake database (DFLIP-3K) for the development of convincing and explainable deepfake detection. It encompasses about 300K diverse deepfake samples from approximately 3K generative models, which boasts the largest number of deepfake models in the literature. Moreover, it collects around 190K linguistic footprints of these deepfakes. The two distinguished features enable DFLIP-3K to develop a benchmark that promotes progress in linguistic profiling of deepfakes, which includes three sub-tasks namely deepfake detection, model identification, and prompt prediction. The deepfake model and prompt are two essential components of each deepfake, and thus dissecting them linguistically allows for an invaluable exploration of trustworthy and interpretable evidence in deepfake detection, which we believe is the key for the next-generation deepfake detection. Furthermore, DFLIP-3K is envisioned as an open database that fosters transparency and encourages collaborative efforts to further enhance its growth. Our extensive experiments on the developed benchmark verify that our DFLIP-3K database is capable of serving as a standardized resource for evaluating and comparing linguistic-based deepfake detection, identification, and prompt prediction techniques.

Key findings

Vision-language models (like Flamingo and CLIP) outperformed traditional vision models (like ResNet and ViT) in deepfake detection and identification. Flamingo, in particular, showed superior performance in prompt prediction, generating reconstructed images closer to the reference deepfakes, demonstrating the value of linguistic profiling for explainable deepfake detection.

Approach

The authors created a new deepfake dataset (DFLIP-3K) with a large number of diverse deepfake samples and their associated prompts. They then established a benchmark for three sub-tasks: deepfake detection, model identification, and prompt prediction, evaluating vision-based and vision-language models on this benchmark.

Datasets

DFLIP-3K (created by the authors), LAION-5B, DiffusionDB, MidJourney User Prompts & Generated Images (250k), data from DALL-E, Imagen, and Parti.

Model(s)

ResNet-50, ViT-base-16, CLIP, Flamingo, BLIP

Author countries

China, United Kingdom

← Previous