A Lightweight and Interpretable Deepfakes Detection Framework

Authors: Muhammad Umar Farooq, Ali Javed, Khalid Mahmood Malik, Muhammad Anas Raza

Published: 2025-01-21 07:03:11+00:00

Journal Ref: International Conference of Advanced Engineering, Technology and Applications, 2021

AI Summary

This paper introduces a unified, lightweight, and interpretable framework for detecting all types of deepfakes, including face-swap, lip-sync, and puppet master. It leverages a novel feature fusion approach combining hybrid facial landmarks with new heart rate features. These features are then used to train an XGBoost classifier, demonstrating superior or comparable detection performance against existing deep learning models while offering enhanced interpretability.

Abstract

The recent realistic creation and dissemination of so-called deepfakes poses a serious threat to social life, civil rest, and law. Celebrity defaming, election manipulation, and deepfakes as evidence in court of law are few potential consequences of deepfakes. The availability of open source trained models based on modern frameworks such as PyTorch or TensorFlow, video manipulations Apps such as FaceApp and REFACE, and economical computing infrastructure has easen the creation of deepfakes. Most of the existing detectors focus on detecting either face-swap, lip-sync, or puppet master deepfakes, but a unified framework to detect all three types of deepfakes is hardly explored. This paper presents a unified framework that exploits the power of proposed feature fusion of hybrid facial landmarks and our novel heart rate features for detection of all types of deepfakes. We propose novel heart rate features and fused them with the facial landmark features to better extract the facial artifacts of fake videos and natural variations available in the original videos. We used these features to train a light-weight XGBoost to classify between the deepfake and bonafide videos. We evaluated the performance of our framework on the world leaders dataset (WLDR) that contains all types of deepfakes. Experimental results illustrate that the proposed framework offers superior detection performance over the comparative deepfakes detection methods. Performance comparison of our framework against the LSTM-FCN, a candidate of deep learning model, shows that proposed model achieves similar results, however, it is more interpretable.


Key findings
The proposed framework achieved an AUC of 0.9505 on the WLDR dataset, outperforming comparative deepfake detection methods. It demonstrated comparable performance to the deep learning model LSTM-FCN (0.95 AUC) for segment-level detection. A key finding is that the XGBoost-based framework is lightweight and offers better interpretability compared to black-box deep learning models.
Approach
The framework extracts 850-D facial landmark features using OpenFace2 and novel 63-D heart rate features from seven specified facial regions of interest. These features are standardized, fused, and then input into a lightweight XGBoost classifier. The model performs classification at both frame and segment levels to distinguish between bonafide and deepfake videos.
Datasets
world leaders dataset (WLDR)
Model(s)
UNKNOWN
Author countries
Pakistan, USA