Robust AI-Generated Face Detection with Imbalanced Data

View on arXiv ← Back to list

Authors: Yamini Sri Krubha, Aryana Hou, Braden Vester, Web Walker, Xin Wang, Li Lin, Shu Hu

Published: 2025-05-04 17:02:10+00:00

AI Summary

This paper addresses the challenges of deepfake detection, particularly the class imbalance and distribution shifts from emerging generative models. It proposes a framework combining dynamic loss reweighting and ranking-based optimization to improve generalization and performance on imbalanced datasets.

Abstract

Deepfakes, created using advanced AI techniques such as Variational Autoencoder and Generative Adversarial Networks, have evolved from research and entertainment applications into tools for malicious activities, posing significant threats to digital trust. Current deepfake detection techniques have evolved from CNN-based methods focused on local artifacts to more advanced approaches using vision transformers and multimodal models like CLIP, which capture global anomalies and improve cross-domain generalization. Despite recent progress, state-of-the-art deepfake detectors still face major challenges in handling distribution shifts from emerging generative models and addressing severe class imbalance between authentic and fake samples in deepfake datasets, which limits their robustness and detection accuracy. To address these challenges, we propose a framework that combines dynamic loss reweighting and ranking-based optimization, which achieves superior generalization and performance under imbalanced dataset conditions. The code is available at https://github.com/Purdue-M2/SP_CUP.

Key findings

The proposed method significantly outperforms baseline methods in terms of AUC, accuracy, F1-score, precision, recall, and EER on the DFWild-Cup dataset. The ablation study confirms the effectiveness of the proposed loss function and MLP architecture. The sensitivity analysis shows the robustness of the method across different hyperparameter values.

Approach

The authors utilize a pre-trained CLIP ViT-L/14 model as a feature extractor, followed by a trainable multi-layer perceptron (MLP). They employ a composite loss function integrating Conditional Value at Risk (CVaR) and weighted AUC loss to handle class imbalance and improve ranking performance. Sharpness-aware minimization (SAM) is used for optimization to enhance generalization.

Datasets

DFWild-Cup dataset, a compilation of eight standard datasets including Celeb-DF-v1, Celeb-DF-v2, FaceForensics++, DeepfakeDetection, FaceShifter, UADFV, Deepfake Detection Challenge Preview, and Deepfake Detection Challenge.

Model(s)

CLIP ViT-L/14 (frozen feature extractor) with a 3-layer MLP.

Author countries

USA

← Previous