Conditioned Prompt-Optimization for Continual Deepfake Detection

Authors: Francesco Laiti, Benedetta Liberatori, Thomas De Min, Elisa Ricci

Published: 2024-07-31 12:22:57+00:00

AI Summary

Prompt2Guard is a novel exemplar-free continual deepfake detection method using Vision-Language Models (VLMs) and multimodal prompts. It improves efficiency and accuracy by using a prediction ensembling technique with read-only prompts, avoiding multiple forward passes and achieving state-of-the-art results on the CDDB-Hard benchmark.

Abstract

The rapid advancement of generative models has significantly enhanced the realism and customization of digital content creation. The increasing power of these tools, coupled with their ease of access, fuels the creation of photorealistic fake content, termed deepfakes, that raises substantial concerns about their potential misuse. In response, there has been notable progress in developing detection mechanisms to identify content produced by these advanced systems. However, existing methods often struggle to adapt to the continuously evolving landscape of deepfake generation. This paper introduces Prompt2Guard, a novel solution for exemplar-free continual deepfake detection of images, that leverages Vision-Language Models (VLMs) and domain-specific multimodal prompts. Compared to previous VLM-based approaches that are either bounded by prompt selection accuracy or necessitate multiple forward passes, we leverage a prediction ensembling technique with read-only prompts. Read-only prompts do not interact with VLMs internal representation, mitigating the need for multiple forward passes. Thus, we enhance efficiency and accuracy in detecting generated content. Additionally, our method exploits a text-prompt conditioning tailored to deepfake detection, which we demonstrate is beneficial in our setting. We evaluate Prompt2Guard on CDDB-Hard, a continual deepfake detection benchmark composed of five deepfake detection datasets spanning multiple domains and generators, achieving a new state-of-the-art. Additionally, our results underscore the effectiveness of our approach in addressing the challenges posed by continual deepfake detection, paving the way for more robust and adaptable solutions in deepfake detection.


Key findings
Prompt2Guard achieves state-of-the-art results on the CDDB-Hard benchmark, outperforming previous methods in task-wise average accuracy. The text-prompt conditioning and prediction ensembling techniques significantly contribute to the improved performance. The method demonstrates robustness to continual learning challenges.
Approach
Prompt2Guard leverages a pre-trained CLIP model, employing read-only prompts that don't alter the VLM's internal representation. It uses a prediction ensembling technique combining predictions from different tasks and a text-prompt conditioning tailored to deepfake detection to improve accuracy and efficiency.
Datasets
CDDB-Hard (continual deepfake detection benchmark composed of five datasets: GauGAN, BigGAN, WildDeepfake, WhichFaceReal, and SAN)
Model(s)
CLIP (ViT-B/16)
Author countries
Italy