Deepfake Detection via Knowledge Injection

Authors: Tonghui Li, Yuanfang Guo, Zeming Liu, Heqi Peng, Yunhong Wang

Published: 2025-03-04 11:11:14+00:00

AI Summary

This paper proposes Knowledge Injection based deepfake Detection (KID), a novel approach that injects knowledge about real and fake data distributions into a ViT-based backbone model. This improves generalization performance in deepfake detection by addressing the limitations of existing methods that overlook the essential role of real data knowledge.

Abstract

Deepfake detection technologies become vital because current generative AI models can generate realistic deepfakes, which may be utilized in malicious purposes. Existing deepfake detection methods either rely on developing classification methods to better fit the distributions of the training data, or exploiting forgery synthesis mechanisms to learn a more comprehensive forgery distribution. Unfortunately, these methods tend to overlook the essential role of real data knowledge, which limits their generalization ability in processing the unseen real and fake data. To tackle these challenges, in this paper, we propose a simple and novel approach, named Knowledge Injection based deepfake Detection (KID), by constructing a multi-task learning based knowledge injection framework, which can be easily plugged into existing ViT-based backbone models, including foundation models. Specifically, a knowledge injection module is proposed to learn and inject necessary knowledge into the backbone model, to achieve a more accurate modeling of the distributions of real and fake data. A coarse-grained forgery localization branch is constructed to learn the forgery locations in a multi-task learning manner, to enrich the learned forgery knowledge for the knowledge injection module. Two layer-wise suppression and contrast losses are proposed to emphasize the knowledge of real data in the knowledge injection module, to further balance the portions of the real and fake knowledge. Extensive experiments have demonstrated that our KID possesses excellent compatibility with different scales of Vit-based backbone models, and achieves state-of-the-art generalization performance while enhancing the training convergence speed.


Key findings
KID achieves state-of-the-art generalization performance across multiple datasets, outperforming existing methods on high-quality forgery datasets. The approach demonstrates excellent compatibility with different ViT-based models and improves training convergence speed.
Approach
KID uses a multi-task learning framework with a knowledge injection module to learn and inject knowledge into a ViT-based model. A coarse-grained forgery localization branch helps enrich forgery knowledge, while layer-wise suppression and contrast losses balance real and fake knowledge.
Datasets
FF++, Celeb-DF-v2, DeepFakeDetection, DeepFake Detection Challenge public test set, Wild-Deepfake
Model(s)
ViT-based models (ViT/B-16, DinoV2, LeVit), RetinaFace (for face detection)
Author countries
China