SELFI: Selective Fusion of Identity for Generalizable Deepfake Detection

Authors: Younghun Kim, Minsuk Jang, Myung-Joon Kwon, Wonjun Lee, Changick Kim

Published: 2025-06-21 05:11:35+00:00

AI Summary

SELFI is a deepfake detection framework that dynamically modulates the use of identity features for improved generalization. It achieves this by using a Forgery-Aware Identity Adapter and an Identity-Aware Fusion Module to selectively integrate identity and visual features based on per-sample relevance.

Abstract

Face identity provides a powerful signal for deepfake detection. Prior studies show that even when not explicitly modeled, classifiers often learn identity features implicitly. This has led to conflicting views: some suppress identity cues to reduce bias, while others rely on them as forensic evidence. To reconcile these views, we analyze two hypotheses: (1) whether face identity alone is discriminative for detecting deepfakes, and (2) whether such identity features generalize poorly across manipulation methods. Our experiments confirm that identity is informative but context-dependent. While some manipulations preserve identity-consistent artifacts, others distort identity cues and harm generalization. We argue that identity features should neither be blindly suppressed nor relied upon, but instead be explicitly modeled and adaptively controlled based on per-sample relevance. We propose textbf{SELFI} (textbf{SEL}ective textbf{F}usion of textbf{I}dentity), a generalizable detection framework that dynamically modulates identity usage. SELFI consists of: (1) a Forgery-Aware Identity Adapter (FAIA) that extracts identity embeddings from a frozen face recognition model and projects them into a forgery-relevant space via auxiliary supervision; and (2) an Identity-Aware Fusion Module (IAFM) that selectively integrates identity and visual features using a relevance-guided fusion mechanism. Experiments on four benchmarks show that SELFI improves cross-manipulation generalization, outperforming prior methods by an average of 3.1% AUC. On the challenging DFDC dataset, SELFI exceeds the previous best by 6%. Code will be released upon paper acceptance.


Key findings
SELFI outperforms existing methods on four benchmark datasets, improving cross-manipulation generalization by an average of 3.1% AUC. On the DFDC dataset, it surpasses the previous best by 6%. Ablation studies confirm that the performance gains are due to the selective integration of identity features and not simply feature ensembling.
Approach
SELFI uses a Forgery-Aware Identity Adapter (FAIA) to project identity embeddings into a forgery-relevant space and an Identity-Aware Fusion Module (IAFM) to selectively fuse identity and visual features based on a predicted relevance score. This adaptive fusion improves generalization across different manipulation methods.
Datasets
FaceForensics++ (FF++), Celeb-DF v2 (CDFv2), DeepfakeDetection (DFD), Deepfake Detection Challenge (DFDC), and its preview version (DFDCP)
Model(s)
CLIP (as backbone), IResNet100 (for identity embeddings), lightweight classifiers for identity and final classification, ResNet34, EfficientNet-B4
Author countries
South Korea