LayLens: Improving Deepfake Understanding through Simplified Explanations

Authors: Abhijeet Narang, Parul Gupta, Liuyijia Su, Abhinav Dhall

Published: 2025-07-14 08:52:03+00:00

AI Summary

LayLens is a tool designed to improve deepfake understanding for non-experts. It uses a three-stage pipeline: explainable deepfake detection, natural language simplification of technical explanations, and visual reconstruction of the original image. A user study demonstrated that simplified explanations significantly improved clarity and user confidence.

Abstract

This demonstration paper presents $mathbf{LayLens}$, a tool aimed to make deepfake understanding easier for users of all educational backgrounds. While prior works often rely on outputs containing technical jargon, LayLens bridges the gap between model reasoning and human understanding through a three-stage pipeline: (1) explainable deepfake detection using a state-of-the-art forgery localization model, (2) natural language simplification of technical explanations using a vision-language model, and (3) visual reconstruction of a plausible original image via guided image editing. The interface presents both technical and layperson-friendly explanations in addition to a side-by-side comparison of the uploaded and reconstructed images. A user study with 15 participants shows that simplified explanations significantly improve clarity and reduce cognitive load, with most users expressing increased confidence in identifying deepfakes. LayLens offers a step toward transparent, trustworthy, and user-centric deepfake forensics.


Key findings
A user study with 15 participants showed that simplified explanations significantly reduced cognitive load and increased user confidence in identifying deepfakes. Users preferred simplified explanations in 65.3% of cases, and the visual reconstruction aided understanding in 69.3% of instances. The study showed statistically significant improvements in ease of understanding and clarity when using simplified explanations.
Approach
LayLens employs a three-stage pipeline. First, a state-of-the-art forgery localization model detects deepfakes and highlights manipulated regions. Second, a vision-language model simplifies the technical explanation into layperson-friendly terms. Finally, a guided image editing model reconstructs a plausible original image.
Datasets
UNKNOWN
Model(s)
Fakeshield, Step1X-Edit, Qwen-VL (Vision Language Model)
Author countries
Australia