PUDD: Towards Robust Multi-modal Prototype-based Deepfake Detection

Authors: Alvaro Lopez Pellcier, Yi Li, Plamen Angelov

Published: 2024-06-22 19:21:42+00:00

AI Summary

PUDD, a prototype-based unified framework, addresses limitations in deepfake detection by comparing input data to known prototypes. It identifies deepfakes or unseen classes via similarity analysis, achieving high accuracy and efficiency.

Abstract

Deepfake techniques generate highly realistic data, making it challenging for humans to discern between actual and artificially generated images. Recent advancements in deep learning-based deepfake detection methods, particularly with diffusion models, have shown remarkable progress. However, there is a growing demand for real-world applications to detect unseen individuals, deepfake techniques, and scenarios. To address this limitation, we propose a Prototype-based Unified Framework for Deepfake Detection (PUDD). PUDD offers a detection system based on similarity, comparing input data against known prototypes for video classification and identifying potential deepfakes or previously unseen classes by analyzing drops in similarity. Our extensive experiments reveal three key findings: (1) PUDD achieves an accuracy of 95.1% on Celeb-DF, outperforming state-of-the-art deepfake detection methods; (2) PUDD leverages image classification as the upstream task during training, demonstrating promising performance in both image classification and deepfake detection tasks during inference; (3) PUDD requires only 2.7 seconds for retraining on new data and emits 10$^{5}$ times less carbon compared to the state-of-the-art model, making it significantly more environmentally friendly.


Key findings
PUDD achieved 95.1% accuracy on Celeb-DF and 94.6% on CIFAKE, outperforming state-of-the-art methods. It's significantly faster to retrain (2.7 seconds) and more environmentally friendly than comparable models. The model also shows promising performance in image classification.
Approach
PUDD uses a prototype learning layer to cluster prototypes from input video/image data. Similarity scores are calculated between input data and prototypes to classify inputs as either deepfakes or belonging to known classes. The m-σ rule helps detect potential deepfakes.
Datasets
Celeb-DF, CIFAKE
Model(s)
DINOV2 (feature extraction), a custom xDNN (classification)
Author countries
UK