MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection

Authors: Chenqi Kong, Anwei Luo, Peijun Bao, Yi Yu, Haoliang Li, Zengwei Zheng, Shiqi Wang, Alex C. Kot

Published: 2024-04-12 13:02:08+00:00

AI Summary

MoE-FFD is a parameter-efficient face forgery detection approach that uses Mixture-of-Experts modules with a Vision Transformer backbone. It addresses limitations of existing ViT-based methods by utilizing Low-Rank Adaptation and Adapter layers for efficient training and combining global and local forgery clue extraction.

Abstract

Deepfakes have recently raised significant trust issues and security concerns among the public. Compared to CNN face forgery detectors, ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance. However, these approaches still exhibit the following limitations: (1) Fully fine-tuning ViT-based models from ImageNet weights demands substantial computational and storage resources; (2) ViT-based methods struggle to capture local forgery clues, leading to model bias; (3) These methods limit their scope on only one or few face forgery features, resulting in limited generalizability. To tackle these challenges, this work introduces Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized yet parameter-efficient ViT-based approach. MoE-FFD only updates lightweight Low-Rank Adaptation (LoRA) and Adapter layers while keeping the ViT backbone frozen, thereby achieving parameter-efficient training. Moreover, MoE-FFD leverages the expressivity of transformers and local priors of CNNs to simultaneously extract global and local forgery clues. Additionally, novel MoE modules are designed to scale the model's capacity and smartly select optimal forgery experts, further enhancing forgery detection performance. Our proposed learning scheme can be seamlessly adapted to various transformer backbones in a plug-and-play manner. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art face forgery detection performance with significantly reduced parameter overhead. The code is released at: https://github.com/LoveSiameseCat/MoE-FFD.


Key findings
MoE-FFD achieves state-of-the-art performance on several face forgery detection datasets, demonstrating superior generalizability and robustness compared to existing methods. The model is parameter-efficient, significantly reducing computational overhead while maintaining high accuracy. The MoE modules are shown to be effective in selecting optimal experts for different inputs and datasets.
Approach
MoE-FFD integrates Mixture-of-Experts modules into a Vision Transformer (ViT) architecture. It updates only lightweight LoRA and Adapter layers, preserving the ViT backbone's ImageNet weights. This allows for parameter-efficient training while leveraging both the global context of transformers and local information from CNN-like adapters.
Datasets
FaceForensics++ (FF++), CelebDF-v2 (CDF), WildDeepfake (WDF), Deepfake Detection Challenge Preview (DFDC-P), DeepfakeDetection (DFD), DeepForensics-1.0 (DFR)
Model(s)
Vision Transformer (ViT) with Low-Rank Adaptation (LoRA) and Adapter layers, Mixture-of-Experts (MoE) modules
Author countries
Singapore, China, Hong Kong