TAR: Generalized Forensic Framework to Detect Deepfakes using Weakly Supervised Learning

Authors: Sangyup Lee, Shahroz Tariq, Junyaup Kim, Simon S. Woo

Published: 2021-05-13 07:31:08+00:00

AI Summary

This paper introduces TAR, a Transfer learning-based Autoencoder with Residuals, for generalized deepfake video detection. TAR uses a sequential transfer learning approach with a small number of training samples to achieve high accuracy across various deepfake generation methods and demonstrates strong performance on real-world deepfakes.

Abstract

Deepfakes have become a critical social problem, and detecting them is of utmost importance. Also, deepfake generation methods are advancing, and it is becoming harder to detect. While many deepfake detection models can detect different types of deepfakes separately, they perform poorly on generalizing the detection performance over multiple types of deepfake. This motivates us to develop a generalized model to detect different types of deepfakes. Therefore, in this work, we introduce a practical digital forensic tool to detect different types of deepfakes simultaneously and propose Transfer learning-based Autoencoder with Residuals (TAR). The ultimate goal of our work is to develop a unified model to detect various types of deepfake videos with high accuracy, with only a small number of training samples that can work well in real-world settings. We develop an autoencoder-based detection model with Residual blocks and sequentially perform transfer learning to detect different types of deepfakes simultaneously. Our approach achieves a much higher generalized detection performance than the state-of-the-art methods on the FaceForensics++ dataset. In addition, we evaluate our model on 200 real-world Deepfake-in-the-Wild (DW) videos of 50 celebrities available on the Internet and achieve 89.49% zero-shot accuracy, which is significantly higher than the best baseline model (gaining 10.77%), demonstrating and validating the practicability of our approach.


Key findings
TAR significantly outperforms state-of-the-art methods on the FaceForensics++ dataset for generalized deepfake detection. It achieves 89.49% accuracy on a real-world Deepfake-in-the-Wild dataset, significantly higher than baselines. The model's effectiveness is enhanced by incorporating residual blocks and a Leaky ReLU function.
Approach
TAR utilizes an autoencoder architecture with residual blocks and a facilitator module to learn deep features of deepfakes. It employs sequential transfer learning, training initially on one deepfake dataset and then transferring the learned knowledge to other datasets using only a few samples per dataset.
Datasets
FaceForensics++, Deepfake-in-the-Wild (DW)
Model(s)
Transfer learning-based Autoencoder with Residuals (TAR)
Author countries
South Korea