Robust Sequential DeepFake Detection

View on arXiv ← Back to list

Authors: Rui Shao, Tianxing Wu, Ziwei Liu

Published: 2023-09-26 15:01:43+00:00

AI Summary

This paper introduces a novel research problem: Detecting Sequential DeepFake Manipulation (Seq-DeepFake), which focuses on detecting multiple facial manipulations applied sequentially. The authors propose SeqFakeFormer and SeqFakeFormer++, transformer-based models, and create a new large-scale dataset, Seq-DeepFake, and its perturbed version, Seq-DeepFake-P, to address this problem.

Abstract

Since photorealistic faces can be readily generated by facial manipulation technologies nowadays, potential malicious abuse of these technologies has drawn great concerns. Numerous deepfake detection methods are thus proposed. However, existing methods only focus on detecting one-step facial manipulation. As the emergence of easy-accessible facial editing applications, people can easily manipulate facial components using multi-step operations in a sequential manner. This new threat requires us to detect a sequence of facial manipulations, which is vital for both detecting deepfake media and recovering original faces afterwards. Motivated by this observation, we emphasize the need and propose a novel research problem called Detecting Sequential DeepFake Manipulation (Seq-DeepFake). Unlike the existing deepfake detection task only demanding a binary label prediction, detecting Seq-DeepFake manipulation requires correctly predicting a sequential vector of facial manipulation operations. To support a large-scale investigation, we construct the first Seq-DeepFake dataset, where face images are manipulated sequentially with corresponding annotations of sequential facial manipulation vectors. Based on this new dataset, we cast detecting Seq-DeepFake manipulation as a specific image-to-sequence task and propose a concise yet effective Seq-DeepFake Transformer (SeqFakeFormer). To better reflect real-world deepfake data distributions, we further apply various perturbations on the original Seq-DeepFake dataset and construct the more challenging Sequential DeepFake dataset with perturbations (Seq-DeepFake-P). To exploit deeper correlation between images and sequences when facing Seq-DeepFake-P, a dedicated Seq-DeepFake Transformer with Image-Sequence Reasoning (SeqFakeFormer++) is devised, which builds stronger correspondence between image-sequence pairs for more robust Seq-DeepFake detection.

Key findings

SeqFakeFormer and its improved version, SeqFakeFormer++, outperform existing deepfake detection methods and baselines on both the Seq-DeepFake and the more challenging Seq-DeepFake-P datasets. The models effectively leverage spatial and sequential information for robust deepfake detection. The proposed approach also enables face recovery by reversing the detected manipulation sequence.

Approach

The authors frame Seq-DeepFake detection as an image-to-sequence task. They propose SeqFakeFormer, which uses a CNN to extract image features, an Image Encoder to model spatial relationships, and a Sequence Decoder with Spatially Enhanced Cross-Attention to model sequential relationships. SeqFakeFormer++ extends this by incorporating Image-Sequence Contrastive Learning and Image-Sequence Matching for improved robustness.

Datasets

Seq-DeepFake, Seq-DeepFake-P (perturbed version of Seq-DeepFake), CelebA-HQ, CelebAMask-HQ, FFHQ, ImageNet

Model(s)

SeqFakeFormer, SeqFakeFormer++, ResNet-34, ResNet-50, ResNet-18, DETR, DRN, TS, MA, BLIP, ALBEF (used as baselines)

Author countries

China, Singapore

← Previous