DeiTFake: Deepfake Detection Model using DeiT Multi-Stage Training

Authors: Saksham Kumar, Ashish Singh, Srinivasarao Thota, Sunil Kumar Singh, Chandan Kumar

Published: 2025-11-15 05:55:09+00:00

AI Summary

DeiTFake proposes a DeiT-based transformer for deepfake detection using a novel two-stage progressive training strategy. This approach employs an initial transfer-learning phase with standard augmentations, followed by a fine-tuning phase with advanced affine and deepfake-specific augmentations. DeiTFake achieves 99.22% accuracy and an AUROC of 0.9997 on the OpenForensics dataset, outperforming existing baselines.

Abstract

Deepfakes are major threats to the integrity of digital media. We propose DeiTFake, a DeiT-based transformer and a novel two-stage progressive training strategy with increasing augmentation complexity. The approach applies an initial transfer-learning phase with standard augmentations followed by a fine-tuning phase using advanced affine and deepfake-specific augmentations. DeiT's knowledge distillation model captures subtle manipulation artifacts, increasing robustness of the detection model. Trained on the OpenForensics dataset (190,335 images), DeiTFake achieves 98.71\\% accuracy after stage one and 99.22\\% accuracy with an AUROC of 0.9997, after stage two, outperforming the latest OpenForensics baselines. We analyze augmentation impact and training schedules, and provide practical benchmarks for facial deepfake detection.


Key findings
DeiTFake achieved a state-of-the-art accuracy of 99.22% and an AUROC of 0.9997 on the OpenForensics dataset, surpassing previous baselines. The two-stage progressive training with increasing augmentation complexity was crucial for capturing subtle manipulation artifacts and improving the model's robustness and generalization, particularly against warping artifacts.
Approach
The approach uses a DeiT Vision Transformer with a two-stage training strategy. Stage one involves transfer learning with standard geometric augmentations, while stage two fine-tunes the model with advanced affine transformations and deepfake-specific augmentations to enhance robustness against real-world manipulations. Knowledge distillation is leveraged to capture subtle manipulation artifacts.
Datasets
OpenForensics dataset (190,335 images)
Model(s)
DeiT (Data-Efficient Image Transformer), specifically facebook/DeiT-base-patch16-224 model
Author countries
India