Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection
Authors: Juan Hu, Shaojing Fan, Terence Sim
Published: 2025-07-20 03:53:52+00:00
AI Summary
This paper proposes HICOM, a novel framework for multi-face deepfake detection that leverages human cognitive cues identified through a series of human studies. HICOM outperforms existing methods by incorporating four key cues: scene-motion coherence, inter-face appearance compatibility, interpersonal gaze alignment, and face-body consistency, leading to improved accuracy and generalization.
Abstract
Multi-face deepfake videos are becoming increasingly prevalent, often appearing in natural social settings that challenge existing detection methods. Most current approaches excel at single-face detection but struggle in multi-face scenarios, due to a lack of awareness of crucial contextual cues. In this work, we develop a novel approach that leverages human cognition to analyze and defend against multi-face deepfake videos. Through a series of human studies, we systematically examine how people detect deepfake faces in social settings. Our quantitative analysis reveals four key cues humans rely on: scene-motion coherence, inter-face appearance compatibility, interpersonal gaze alignment, and face-body consistency. Guided by these insights, we introduce textsf{HICOM}, a novel framework designed to detect every fake face in multi-face scenarios. Extensive experiments on benchmark datasets show that textsf{HICOM} improves average accuracy by 3.3% in in-dataset detection and 2.8% under real-world perturbations. Moreover, it outperforms existing methods by 5.8% on unseen datasets, demonstrating the generalization of human-inspired cues. textsf{HICOM} further enhances interpretability by incorporating an LLM to provide human-readable explanations, making detection results more transparent and convincing. Our work sheds light on involving human factors to enhance defense against deepfakes.