Human Action CLIPs: Detecting AI-generated Human Motion
Authors: Matyas Bohacek, Hany Farid
Published: 2024-11-30 16:20:58+00:00
AI Summary
This paper introduces a robust technique for distinguishing real from AI-generated human motion in videos using multi-modal semantic embeddings, specifically CLIP embeddings. The method is shown to be resilient to common video manipulations like resolution and compression attacks and generalizes well to unseen AI models.
Abstract
AI-generated video generation continues its journey through the uncanny valley to produce content that is increasingly perceptually indistinguishable from reality. To better protect individuals, organizations, and societies from its malicious applications, we describe an effective and robust technique for distinguishing real from AI-generated human motion using multi-modal semantic embeddings. Our method is robust to the types of laundering that typically confound more low- to mid-level approaches, including resolution and compression attacks. This method is evaluated against DeepAction, a custom-built, open-sourced dataset of video clips with human actions generated by seven text-to-video AI models and matching real footage. The dataset is available under an academic license at https://www.huggingface.co/datasets/faridlab/deepaction_v1.