Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis

Authors: Trisha Mittal, Ritwik Sinha, Viswanathan Swaminathan, John Collomosse, Dinesh Manocha

Published: 2022-07-26 17:39:04+00:00

AI Summary

The paper introduces VideoSham, a new dataset of 826 videos (413 real, 413 manipulated using 6 diverse attacks beyond facial manipulations) to advance research in video manipulation detection. The authors demonstrate that state-of-the-art algorithms perform poorly on this dataset, highlighting the need for more robust methods.

Abstract

As tools for content editing mature, and artificial intelligence (AI) based algorithms for synthesizing media grow, the presence of manipulated content across online media is increasing. This phenomenon causes the spread of misinformation, creating a greater need to distinguish between ``real'' and ``manipulated'' content. To this end, we present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated). Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face. VideoSham, on the other hand, contains more diverse, context-rich, and human-centric, high-resolution videos manipulated using a combination of 6 different spatial and temporal attacks. Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham. We performed a user study on Amazon Mechanical Turk with 1200 participants to understand if they can differentiate between the real and manipulated videos in VideoSham. Finally, we dig deeper into the strengths and weaknesses of performances by humans and SOTA-algorithms to identify gaps that need to be filled with better AI algorithms. We present the dataset at https://github.com/adobe-research/VideoSham-dataset.


Key findings
State-of-the-art deepfake detection and video forensic algorithms achieve less than 50% accuracy on VideoSham. Human participants also struggle to reliably detect many of the manipulations, particularly spatial ones. The results highlight the limitations of current techniques and the need for more robust and generalized methods.
Approach
The paper focuses on creating a new dataset, VideoSham, with diverse video manipulations beyond facial deepfakes. They then evaluate existing state-of-the-art deepfake detection and video forensic algorithms on this dataset and conduct a user study to assess human detection capabilities.
Datasets
VideoSham (created by the authors), various existing deepfake datasets are referenced for comparison (e.g., DF-TIMIT, FaceForensics++, DeeperForensics 1.0, WildDeepFake)
Model(s)
Li et al. [32], MesoNet [2], Mittal et al. [40], Long et al. [37], Liu et al. [36]
Author countries
USA, UK