On the Holistic Approach for Detecting Human Image Forgery

Authors: Xiao Guo, Jie Zhu, Anil Jain, Xiaoming Liu

Published: 2026-01-08 08:33:22+00:00

Comment: 6 figures, 5 tables

AI Summary

This paper introduces HuForDet, a holistic framework designed for detecting human image forgeries, addressing both facial manipulations and full-body synthetic images. HuForDet employs a dual-branch architecture: a face forgery detection branch with heterogeneous experts operating in RGB and frequency domains (including an adaptive Laplacian-of-Gaussian module), and a contextualized forgery detection branch leveraging a Multi-Modal Large Language Model with a confidence estimation mechanism for semantic consistency analysis. The framework demonstrates state-of-the-art performance and superior robustness across diverse human image forgeries on a newly curated HuFor dataset.

Abstract

The rapid advancement of AI-generated content (AIGC) has escalated the threat of deepfakes, from facial manipulations to the synthesis of entire photorealistic human bodies. However, existing detection methods remain fragmented, specializing either in facial-region forgeries or full-body synthetic images, and consequently fail to generalize across the full spectrum of human image manipulations. We introduce HuForDet, a holistic framework for human image forgery detection, which features a dual-branch architecture comprising: (1) a face forgery detection branch that employs heterogeneous experts operating in both RGB and frequency domains, including an adaptive Laplacian-of-Gaussian (LoG) module designed to capture artifacts ranging from fine-grained blending boundaries to coarse-scale texture irregularities; and (2) a contextualized forgery detection branch that leverages a Multi-Modal Large Language Model (MLLM) to analyze full-body semantic consistency, enhanced with a confidence estimation mechanism that dynamically weights its contribution during feature fusion. We curate a human image forgery (HuFor) dataset that unifies existing face forgery data with a new corpus of full-body synthetic humans. Extensive experiments show that our HuForDet achieves state-of-the-art forgery detection performance and superior robustness across diverse human image forgeries.


Key findings
HuForDet achieved state-of-the-art overall AUC of 90.22% on the HuFor dataset, demonstrating significant improvement over existing methods. It showed superior robustness and generalization across diverse forgery types, effectively detecting both partial facial manipulations and full-body synthetic images. The confidence-aware dynamic fusion mechanism was crucial, yielding a 15.70% AUC improvement over naive concatenation by adaptively weighting branch contributions based on forgery characteristics.
Approach
The proposed HuForDet framework features a dual-branch architecture. One branch specializes in face forgery detection, using heterogeneous experts (RGB-domain convolutional blocks and adaptive Laplacian-of-Gaussian blocks) to capture artifacts in both spatial and frequency domains across multiple scales. The second branch performs contextualized forgery detection by leveraging a Multi-Modal Large Language Model (MLLM) to analyze full-body semantic consistency, augmented by a confidence estimation mechanism that dynamically weights its contribution during feature fusion.
Datasets
HuFor (newly curated), FaceForensics++ (FF++), UniAttack+, Diff-Cele (newly generated full-body celebrity images), Celeb-DF dataset.
Model(s)
For the face forgery detection branch: DenseNet-121 as a baseline, enhanced with a Mixture-of-Experts (MoE) layer (containing standard convolutional blocks and adaptive Laplacian of Gaussian blocks). For the contextualized forgery detection branch: CLIP-ViT/336px (vision encoder) and Vicuna-7B (Large Language Model), with an expanded vocabulary to include a special confidence token '<s>'.
Author countries
United States