SUEDE:Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement

Authors: Zuying Xie, Changtao Miao, Ajian Liu, Jiabao Guo, Feng Li, Dan Guo, Yunfeng Diao

Published: 2025-04-07 08:17:54+00:00

AI Summary

SUEDE, a Shared Unified Experts model, is proposed for enhanced physical-digital face attack detection. It utilizes a shared expert to capture common features across attack types and routed experts for specific attack types, leveraging CLIP for visual-text alignment and improved performance.

Abstract

Face recognition systems are vulnerable to physical attacks (e.g., printed photos) and digital threats (e.g., DeepFake), which are currently being studied as independent visual tasks, such as Face Anti-Spoofing and Forgery Detection. The inherent differences among various attack types present significant challenges in identifying a common feature space, making it difficult to develop a unified framework for detecting data from both attack modalities simultaneously. Inspired by the efficacy of Mixture-of-Experts (MoE) in learning across diverse domains, we explore utilizing multiple experts to learn the distinct features of various attack types. However, the feature distributions of physical and digital attacks overlap and differ. This suggests that relying solely on distinct experts to learn the unique features of each attack type may overlook shared knowledge between them. To address these issues, we propose SUEDE, the Shared Unified Experts for Physical-Digital Face Attack Detection Enhancement. SUEDE combines a shared expert (always activated) to capture common features for both attack types and multiple routed experts (selectively activated) for specific attack types. Further, we integrate CLIP as the base network to ensure the shared expert benefits from prior visual knowledge and align visual-text representations in a unified space. Extensive results demonstrate SUEDE achieves superior performance compared to state-of-the-art unified detection methods.


Key findings
SUEDE outperforms state-of-the-art methods on UniAttackData and JFSFDB datasets in terms of ACER, ACC, AUC, and EER, demonstrating its effectiveness in unified attack detection. The model shows superior generalization to unseen attack types and benefits significantly from the shared expert and CLIP integration.
Approach
SUEDE combines a shared expert (always active) and routed experts (selectively activated) to learn common and specific features of physical and digital face attacks. It uses CLIP as a base network to benefit from prior visual knowledge and align visual-text representations.
Datasets
UniAttackData (with Protocols 1 and 2), JFSFDB
Model(s)
Vision Transformer (ViT-B/16) with a Mixture-of-Experts (MoE) architecture incorporating a shared expert and multiple routed experts; CLIP is used as the base network.
Author countries
China