Fine-Grained DINO Tuning with Dual Supervision for Face Forgery Detection

Authors: Tianxiang Zhang, Peipeng Yu, Zhihua Xia, Longchen Dai, Xiaoyu Zhou, Hui Gao

Published: 2025-11-15 08:57:21+00:00

Comment: Accepted by AAAI 2026

AI Summary

This paper introduces the DeepFake Fine-Grained Adapter (DFF-Adapter) for DINOv2 to enhance deepfake detection by addressing the lack of sensitivity to distinct artifacts from various forgery methods. It integrates lightweight multi-head LoRA modules into every transformer block, enabling efficient backbone adaptation with dual supervision for authenticity and fine-grained manipulation type classification. A shared branch propagates manipulation-specific cues to the authenticity head, achieving multi-task cooperative optimization and leading to comparable or superior detection accuracy with high parameter efficiency.

Abstract

The proliferation of sophisticated deepfakes poses significant threats to information integrity. While DINOv2 shows promise for detection, existing fine-tuning approaches treat it as generic binary classification, overlooking distinct artifacts inherent to different deepfake methods. To address this, we propose a DeepFake Fine-Grained Adapter (DFF-Adapter) for DINOv2. Our method incorporates lightweight multi-head LoRA modules into every transformer block, enabling efficient backbone adaptation. DFF-Adapter simultaneously addresses authenticity detection and fine-grained manipulation type classification, where classifying forgery methods enhances artifact sensitivity. We introduce a shared branch propagating fine-grained manipulation cues to the authenticity head. This enables multi-task cooperative optimization, explicitly enhancing authenticity discrimination with manipulation-specific knowledge. Utilizing only 3.5M trainable parameters, our parameter-efficient approach achieves detection accuracy comparable to or even surpassing that of current complex state-of-the-art methods.


Key findings
The proposed DFF-Adapter achieves detection accuracy comparable to or surpassing state-of-the-art methods, utilizing only 3.5M trainable parameters. It demonstrates superior generalization in cross-dataset evaluations, outperforming existing approaches by an average of 2.41 AUC points. Furthermore, the method exhibits strong generalization to unseen forgery techniques, securing the best overall performance in cross-manipulation evaluations on the DF40 dataset, including GAN and diffusion-based synthesis.
Approach
The approach fine-tunes a frozen DINOv2 backbone using a DeepFake Fine-Grained Adapter (DFF-Adapter) inserted into every Transformer block. This adapter incorporates lightweight multi-head LoRA modules and employs dual supervision for both authenticity detection and fine-grained manipulation type classification. A shared branch propagates manipulation-specific cues to the authenticity head, enabling multi-task cooperative optimization and enhancing artifact sensitivity.
Datasets
FaceForensics++ (FF++), CDF-v1, CDF-v2, DFDCP, DFDC, DF40
Model(s)
DINOv2 (facebook/dinov2-with-registers-large checkpoint), DeepFake Fine-Grained Adapter (DFF-Adapter) which includes lightweight multi-head LoRA modules, Forgery-Aware Multi-Head Router (FAMHR), and Shared-Enhanced Task Fusion (SETF).
Author countries
China