CorrDetail: Visual Detail Enhanced Self-Correction for Face Forgery Detection

Authors: Binjia Zhou, Hengrui Lou, Lizhe Chen, Haoyuan Li, Dawei Luo, Shuai Chen, Jie Lei, Zunlei Feng, Yijun Bei

Published: 2025-07-07 06:29:57+00:00

AI Summary

CorrDetail, a visual detail enhanced self-correction framework, improves face forgery detection by rectifying authentic forgery details using error-guided questioning and incorporating a visual fine-grained detail enhancement module. This approach achieves state-of-the-art performance and provides accurate identification of forged details.

Abstract

With the swift progression of image generation technology, the widespread emergence of facial deepfakes poses significant challenges to the field of security, thus amplifying the urgent need for effective deepfake detection.Existing techniques for face forgery detection can broadly be categorized into two primary groups: visual-based methods and multimodal approaches. The former often lacks clear explanations for forgery details, while the latter, which merges visual and linguistic modalities, is more prone to the issue of hallucinations.To address these shortcomings, we introduce a visual detail enhanced self-correction framework, designated CorrDetail, for interpretable face forgery detection. CorrDetail is meticulously designed to rectify authentic forgery details when provided with error-guided questioning, with the aim of fostering the ability to uncover forgery details rather than yielding hallucinated responses. Additionally, to bolster the reliability of its findings, a visual fine-grained detail enhancement module is incorporated, supplying CorrDetail with more precise visual forgery details. Ultimately, a fusion decision strategy is devised to further augment the model's discriminative capacity in handling extreme samples, through the integration of visual information compensation and model bias reduction.Experimental results demonstrate that CorrDetail not only achieves state-of-the-art performance compared to the latest methodologies but also excels in accurately identifying forged details, all while exhibiting robust generalization capabilities.


Key findings
CorrDetail achieves state-of-the-art performance on several benchmark datasets, exceeding existing methods in accuracy and generalization. The method effectively identifies forged details and shows robustness against various forgery techniques. Ablation studies confirm the effectiveness of each module in the proposed framework.
Approach
CorrDetail uses a self-correction visual question answering (SCVQA) strategy to train a visual language model (VLM). It incorporates a cross-model forgery detail enhancement module to provide precise visual cues and a fusion decision strategy to improve handling of extreme samples.
Datasets
FF++, CelebDF, WildDeepfake (WDF), DeepForensics-1.0 (DFR), DeepFaceGen, and a new SCVQA dataset created by the authors based on FF++
Model(s)
LLaVA-1.5-7B (a VLM), Vision Transformer (ViT), CLIP
Author countries
China