On Improving Cross-dataset Generalization of Deepfake Detectors

View on arXiv ← Back to list

Authors: Aakash Varma Nadimpalli, Ajita Rattani

Published: 2022-04-08 20:34:53+00:00

AI Summary

This paper proposes a hybrid supervised and reinforcement learning approach to improve cross-dataset generalization in deepfake detection. An RL agent selects optimal augmentations for each test image, improving the CNN-based classifier's performance across different datasets.

Abstract

Facial manipulation by deep fake has caused major security risks and raised severe societal concerns. As a countermeasure, a number of deep fake detection methods have been proposed recently. Most of them model deep fake detection as a binary classification problem using a backbone convolutional neural network (CNN) architecture pretrained for the task. These CNN-based methods have demonstrated very high efficacy in deep fake detection with the Area under the Curve (AUC) as high as 0.99. However, the performance of these methods degrades significantly when evaluated across datasets. In this paper, we formulate deep fake detection as a hybrid combination of supervised and reinforcement learning (RL) to improve its cross-dataset generalization performance. The proposed method chooses the top-k augmentations for each test sample by an RL agent in an image-specific manner. The classification scores, obtained using CNN, of all the augmentations of each test image are averaged together for final real or fake classification. Through extensive experimental validation, we demonstrate the superiority of our method over existing published research in cross-dataset generalization of deep fake detectors, thus obtaining state-of-the-art performance.

Key findings

The proposed method significantly improves cross-dataset generalization performance compared to existing methods, achieving state-of-the-art results on Celeb-DF. The use of an RL agent for image-specific augmentation selection is shown to be superior to random augmentation. The PPO-based RL agent outperforms the DQN-based agent.

Approach

The method uses a CNN for deepfake classification and an RL agent to select the top-k augmentations for each test image. The CNN's classification scores for the augmented images are averaged to obtain the final classification result. The RL agent is trained to maximize the reward, defined as the difference in classification loss before and after augmentation.

Datasets

FaceForensics++, DeeperForensics-1.0, Celeb-DF

Model(s)

ResNet-50, InceptionNet-v3, EfficientNet v2-L, XceptionNet; Proximal Policy Optimization (PPO) and Deep Q Network (DQN) for reinforcement learning.

Author countries

USA

← Previous