TruthLens:A Training-Free Paradigm for DeepFake Detection

View on arXiv ← Back to list

Authors: Ritabrata Chakraborty, Rajatsubhra Chakraborty, Ali Khaleghi Rahimian, Thomas MacDougall

Published: 2025-03-19 15:41:32+00:00

AI Summary

TruthLens is a training-free deepfake detection framework that uses large vision-language models (LVLMs) and large language models (LLMs) to analyze visual artifacts and provide interpretable explanations for its classifications. It reimagines deepfake detection as a visual question-answering (VQA) task, achieving high accuracy while offering transparent reasoning.

Abstract

The proliferation of synthetic images generated by advanced AI models poses significant challenges in identifying and understanding manipulated visual content. Current fake image detection methods predominantly rely on binary classification models that focus on accuracy while often neglecting interpretability, leaving users without clear insights into why an image is deemed real or fake. To bridge this gap, we introduce TruthLens, a novel training-free framework that reimagines deepfake detection as a visual question-answering (VQA) task. TruthLens utilizes state-of-the-art large vision-language models (LVLMs) to observe and describe visual artifacts and combines this with the reasoning capabilities of large language models (LLMs) like GPT-4 to analyze and aggregate evidence into informed decisions. By adopting a multimodal approach, TruthLens seamlessly integrates visual and semantic reasoning to not only classify images as real or fake but also provide interpretable explanations for its decisions. This transparency enhances trust and provides valuable insights into the artifacts that signal synthetic content. Extensive evaluations demonstrate that TruthLens outperforms conventional methods, achieving high accuracy on challenging datasets while maintaining a strong emphasis on explainability. By reframing deepfake detection as a reasoning-driven process, TruthLens establishes a new paradigm in combating synthetic media, combining cutting-edge performance with interpretability to address the growing threats of visual disinformation.

Key findings

TruthLens outperforms existing methods (DIRE and CNNDetection) in AUC scores and classification accuracy on both LDM and ProGAN datasets. The inclusion of prompts and an LLM significantly improves accuracy, particularly for the challenging LDM dataset. Ablation studies show that focusing on specific visual cues, like "Eyes and Pupils," improves detection accuracy.

Approach

TruthLens employs a four-step pipeline: (1) generating predefined prompts about visual cues, (2) using a multimodal model to answer prompts based on the image, (3) aggregating the answers into a structured summary, and (4) using an LLM to classify the image and provide a reasoned explanation.

Datasets

LDM Dataset (1000 fake images from Latent Diffusion Models and 1000 real images from FFHQ), ProGAN Dataset (1000 fake images from ProGAN derived from ForgeryNet)

Model(s)

LLaVA, BLIP-2, ChatUniVi, CogVLM, and GPT-4 (mentioned as an example LLM)

Author countries

India, USA

← Previous