Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection

View on arXiv ← Back to list

Authors: Juan Hu, Shaojing Fan, Terence Sim

Published: 2025-07-20 03:53:52+00:00

AI Summary

This paper proposes HICOM, a novel framework for multi-face deepfake detection that leverages human cognitive cues identified through a series of human studies. HICOM outperforms existing methods by incorporating four key cues: scene-motion coherence, inter-face appearance compatibility, interpersonal gaze alignment, and face-body consistency, leading to improved accuracy and generalization.

Abstract

Multi-face deepfake videos are becoming increasingly prevalent, often appearing in natural social settings that challenge existing detection methods. Most current approaches excel at single-face detection but struggle in multi-face scenarios, due to a lack of awareness of crucial contextual cues. In this work, we develop a novel approach that leverages human cognition to analyze and defend against multi-face deepfake videos. Through a series of human studies, we systematically examine how people detect deepfake faces in social settings. Our quantitative analysis reveals four key cues humans rely on: scene-motion coherence, inter-face appearance compatibility, interpersonal gaze alignment, and face-body consistency. Guided by these insights, we introduce textsf{HICOM}, a novel framework designed to detect every fake face in multi-face scenarios. Extensive experiments on benchmark datasets show that textsf{HICOM} improves average accuracy by 3.3% in in-dataset detection and 2.8% under real-world perturbations. Moreover, it outperforms existing methods by 5.8% on unseen datasets, demonstrating the generalization of human-inspired cues. textsf{HICOM} further enhances interpretability by incorporating an LLM to provide human-readable explanations, making detection results more transparent and convincing. Our work sheds light on involving human factors to enhance defense against deepfakes.

Key findings

HICOM shows significant improvements in average accuracy (3.3% in-dataset, 2.8% under real-world perturbations, and 5.8% on unseen datasets) compared to existing methods. It also demonstrates better generalization to unseen datasets and robustness to real-world perturbations. Furthermore, HICOM outperforms human performance in multi-face deepfake detection.

Approach

HICOM uses a human-inspired approach, identifying four key cues humans use to detect deepfakes in multi-face videos through human studies. It then builds a four-module framework (scene-motion, inter-face appearance, gaze, and body-face) to detect these cues, fusing their outputs for comprehensive deepfake detection.

Datasets

FFIW, OpenForensics, DF-Platter, ManualFake, FF++ (for single-face adaptation)

Model(s)

ResNet, Transformer, Inference Network (specific architectures used within each module are mentioned but not explicitly named as a single model)

Author countries

Singapore

← Previous