TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection

View on arXiv ← Back to list

Authors: Wenkui Yang, Zhida Zhang, Xiaoqiang Zhou, Junxian Duan, Jie Cao

Published: 2025-05-13 11:01:25+00:00

AI Summary

This paper introduces TT-DF, a large-scale dataset of deepfake videos focusing on human body forgery, addressing the lack of such datasets. It also proposes TOF-Net, a novel detection model that leverages spatiotemporal inconsistencies and optical flow to identify manipulated videos, outperforming existing methods.

Abstract

The emergence and popularity of facial deepfake methods spur the vigorous development of deepfake datasets and facial forgery detection, which to some extent alleviates the security concerns about facial-related artificial intelligence technologies. However, when it comes to human body forgery, there has been a persistent lack of datasets and detection methods, due to the later inception and complexity of human body generation methods. To mitigate this issue, we introduce TikTok-DeepFake (TT-DF), a novel large-scale diffusion-based dataset containing 6,120 forged videos with 1,378,857 synthetic frames, specifically tailored for body forgery detection. TT-DF offers a wide variety of forgery methods, involving multiple advanced human image animation models utilized for manipulation, two generative configurations based on the disentanglement of identity and pose information, as well as different compressed versions. The aim is to simulate any potential unseen forged data in the wild as comprehensively as possible, and we also furnish a benchmark on TT-DF. Additionally, we propose an adapted body forgery detection model, Temporal Optical Flow Network (TOF-Net), which exploits the spatiotemporal inconsistencies and optical flow distribution differences between natural data and forged data. Our experiments demonstrate that TOF-Net achieves favorable performance on TT-DF, outperforming current state-of-the-art extendable facial forgery detection models. For our TT-DF dataset, please refer to https://github.com/HashTAG00002/TT-DF.

Key findings

TOF-Net achieved superior performance on the TT-DF dataset compared to baseline models. The model demonstrated good generalization ability across different forgery configurations and manipulation methods. The motion-guided branch of TOF-Net proved particularly effective.

Approach

The authors address body forgery detection by creating the TT-DF dataset, which includes various forgery methods and compression levels. They propose TOF-Net, a model using two branches: one for spatiotemporal attention on identity information and another for motion-guided optical flow analysis of pose information.

Datasets

TikTok-DeepFake (TT-DF) dataset, created by the authors; TikTok dataset for data generation.

Model(s)

Temporal Optical Flow Network (TOF-Net); Xception, TALL-Swin, BAR-Net (used as baselines).

Author countries

China

← Previous